如果我理解正确的话,您可以使用正则表达式来更改文本。考虑这个例子:
from bs4 import BeautifulSoup
html_text = """\
<body>
<p>Lorem ipsum dolor sit amet,
consectetur adipiscing elit.
Maecenas sed mi lacus.
<span>This is inner span.</span>
Vivamus luctus vehicula lacus,
ut malesuada justo posuere et.
Donec ut diam volutpat</p>
</body>"""
soup = BeautifulSoup(html_text, "html.parser")
print(soup.p.text)
打印:
Lorem ipsum dolor sit amet,
consectetur adipiscing elit.
Maecenas sed mi lacus.
This is inner span.
Vivamus luctus vehicula lacus,
ut malesuada justo posuere et.
Donec ut diam volutpat
你可以这样做:
import re
print(re.sub(r"\s{2,}", " ", soup.p.text))
这会响起:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas sed mi lacus. This is inner span. Vivamus luctus vehicula lacus, ut malesuada justo posuere et. Donec ut diam volutpat