如何从 soup 对象中删除所有可导航字符串?

问题描述 投票:0回答:2

有没有办法删除所有可导航字符串并仅在 soup 对象中保留标签?

python python-3.x beautifulsoup
2个回答
0
投票

你的意思是这样的吗?

    soup = BeautifulSoup(contents, features="html.parser")

    for child in soup.descendants:

        if child.name:
            print(child.name)

输出

html
head
title
meta
body
h2
p
ul
li
li
li
li
li

0
投票

是的:

from bs4 import BeautifulSoup, NavigableString


soup = BeautifulSoup(html_page, 'html.parser')
element = soup.find()
strings = []
while element is not None:
    if isinstance(element, NavigableString):
        strings.append(element)
    element = element.next_element
for element in strings:
    element.extract()
print(soup)

首先,获取所有 NavigableString,然后将它们全部提取。

直接提取循环中的元素,像这样:

while element is not None:
    if isinstance(element, NavigableString):
        element.extract()
    element = element.next_element()

不起作用,因为提取的可导航字符串不再有

next_element
,因为它不再在汤里了。

© www.soinside.com 2019 - 2024. All rights reserved.