有没有办法删除所有可导航字符串并仅在 soup 对象中保留标签?
你的意思是这样的吗?
soup = BeautifulSoup(contents, features="html.parser")
for child in soup.descendants:
if child.name:
print(child.name)
输出
html
head
title
meta
body
h2
p
ul
li
li
li
li
li
是的:
from bs4 import BeautifulSoup, NavigableString
soup = BeautifulSoup(html_page, 'html.parser')
element = soup.find()
strings = []
while element is not None:
if isinstance(element, NavigableString):
strings.append(element)
element = element.next_element
for element in strings:
element.extract()
print(soup)
首先,获取所有 NavigableString,然后将它们全部提取。
直接提取循环中的元素,像这样:
while element is not None:
if isinstance(element, NavigableString):
element.extract()
element = element.next_element()
不起作用,因为提取的可导航字符串不再有
next_element
,因为它不再在汤里了。