如何使用BeautifulSoup提取以下信息？

Question

这是HTML源代码

<div class="clg-type clgAtt"><label class="label">Ownership: </label>Private</div>
<div class="clg-type clgAtt"><label class="label">Institute Type: </label> Affiliated College</div>

如何从以下来源中仅提取私立和附属学院。这两个信息也应分别提取到两个不同的变量中。该方法应该是可扩展的，以便它可以用于提取类似的大型源代码的所有信息，其中html重复自身。

可扩展我的意思是说

   college1<div class="clg-type clgAtt"><label class="label">Ownership: </label>Private</div>
<div class="clg-type clgAtt"><label class="label">Institute Type: </label> Affiliated College</div>
college2<div class="clg-type clgAtt"><label class="label">Ownership: </label>Private</div>
    <div class="clg-type clgAtt"><label class="label">Institute Type: </label> Affiliated College</div>

所以我想我需要为这两个实体提取私立和附属学院

Answer 1

你可以找到包含你想要的文字的<div>标签

soup.find_all('div', class_='clg-type clgAtt')

由于第一个标签用于所有权，第二个标签用于学院类型，因此您可以像这样分配它们：

ownership, institute = soup.find_all('div', class_='clg-type clgAtt')

现在，如果你打印任何一个变量（print(ownership.contents)）的内容，你会看到：

[<label class="label">Ownership: </label>, 'Private']

因此，您可以使用contents[1]或contents[-1]来获取所需的文本，因为它位于contents的第一个索引（或最后一个索引）中。

完整代码：

from bs4 import BeautifulSoup

html = '''
<div class="clg-type clgAtt"><label class="label">Ownership: </label>Private</div>
<div class="clg-type clgAtt"><label class="label">Institute Type: </label>Affiliated College</div>
'''
soup = BeautifulSoup(html, 'lxml')

ownership, institute = [x.contents[1] for x in soup.find_all('div', class_='clg-type clgAtt')]
print(ownership, institute, sep='\n')

输出：

Private
Affiliated College

Answer 2

以下方法对汤具有破坏性，但可以通过在找到标签后复制标签来解决。

from bs4 import BeautifulSoup as bs

html = '''<div class="clg-type clgAtt"><label class="label">Ownership: </label>Private</div>
<div class="clg-type clgAtt"><label class="label">Institute Type: </label> Affiliated College</div>'''

soup = bs(html,'lxml')
divs = soup.find_all('div')
text = []
for div in divs:
    div.label.extract()
    text.append(div.text)
print(text)

结果

['Private', ' Affiliated College']

如何使用BeautifulSoup提取以下信息？

问题描述投票：0回答：2

2个回答

最新问题

如何使用BeautifulSoup提取以下信息？

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2