无法使用选择器获取不同“h”标签的内容

Question

我试图从一些html元素中获取多个h标签的不同标题。 h标签总是附加一些数字，如h1，h14，h17。我知道我可以利用.select("h1,h11,h9")来获取它们，但它们很多。我可以使用.select("[class^='heading']")处理它们，如果它们像class="heading1"，class="heading2"，class="heading3"。

如何使用选择器获取不同h标签的所有内容？

我的尝试：

htmlelements="""
<h1>
    <a href="https://somesite.com/">SEC fight</a>
</h1>
<h11>
    <a href="https://somesite.com/">AFC fight</a>
</h11>
<h9>
    <a href="https://somesite.com/">UTY fight</a>
</h9>
"""

from bs4 import BeautifulSoup

page = BeautifulSoup(htmlelements, "lxml")
for item in page.select("h11"):
    print(item.text)

PS正则表达式不是.find_all(string=re.compile("h"))的选项。

Answer 1

一种方法是将.find_all()用于所有可能的h标签：

htmlelements="""
<h1>
    <a href="https://somesite.com/">SEC fight</a>
</h1>
<h11>
    <a href="https://somesite.com/">AFC fight</a>
</h11>
<h9>
    <a href="https://somesite.com/">UTY fight</a>
</h9>
"""

from bs4 import BeautifulSoup

page = BeautifulSoup(htmlelements, "lxml")

for item in page.find_all(f"h{h}" for h in range(1, 20)):
    print(item.get_text(strip=True))

这将显示：

SEC fight
AFC fight
UTY fight

无法使用选择器获取不同“h”标签的内容

问题描述投票：1回答：1

1个回答

最新问题

无法使用选择器获取不同“h”标签的内容

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1