当标签为空时用Beautifulsoup填充值

问题描述 投票:2回答:2

我正在尝试解析网页中某个类的所有td标签的内容,但是我希望具有某种占位符内容,即使标签本身没有任何占位符。例如,html包含如下td标签:

<td class="odds bdevtt moneylineodds " cfg="">+134</td>
<td class="odds bdevtt moneylineodds " cfg=""></td>
<td class="odds bdevtt moneylineodds " cfg="">-140</td>

我正在尝试获取类似['+134','-','-140']的列表作为输出,因此列表中的条目数等于带有'-'的匹配标签的数量表示标签为空的占位符。但是,以下内容仅返回['+134','-140']。

soup.find_all('td', attrs={'class': 'odds bdevtt moneylineodds '})
python html parsing beautifulsoup tags
2个回答
0
投票
from bs4 import BeautifulSoup

html = """
<td class="odds bdevtt moneylineodds " cfg="">+134</td>
<td class="odds bdevtt moneylineodds " cfg=""></td>
<td class="odds bdevtt moneylineodds " cfg="">-140</td>
"""
soup = BeautifulSoup(html,"html.parser")
all = [i.text if i.text != "" else "-" for i in soup.find_all('td', attrs={'class': 'odds bdevtt moneylineodds '})]
print(all)

# output: ['+134', '-', '-140']

0
投票

class属性的值中删除尾随空格,您将获得预期的结果。

代码:

for elm in soup.find_all('td', attrs={'class': 'odds bdevtt moneylineodds'}):
  print(elm.text)

输出:

+134

-140

原因是执行代码时

html = """
<td class="odds bdevtt moneylineodds " cfg="">+134</td>
<td class="odds bdevtt moneylineodds " cfg=""></td>
<td class="odds bdevtt moneylineodds " cfg="">-140</td>
"""
soup = BeautifulSoup(html,"html.parser")   # <-- It will trim the trailing spaces from class value
print(soup)

输出:

<td cfg="" class="odds bdevtt moneylineodds">+134</td>
<td cfg="" class="odds bdevtt moneylineodds"></td>
<td cfg="" class="odds bdevtt moneylineodds">-140</td>
© www.soinside.com 2019 - 2024. All rights reserved.