试图抓取特定的HTML进行网页搜索

Question

我试图从以下网址抓取数据：https://www.pro-football-reference.com/boxscores/201809060phi.htm

具体来说，我想要“通过，冲，和接收”表中的信息。我有以下代码：

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

# assigning url
my_url = 'https://www.pro-football-reference.com/boxscores/201809060phi.htm'

# opening up connection, grabbing the page
raw_page = uReq(my_url)
page_html = raw_page.read()
raw_page.close()

# html parsing
page_soup = soup(page_html,"html.parser")

# assign variable to stat table
stat_table = page_soup.find ("div",{"id":"all_player_offense"})
inner_table = stat_table.findAll("tr")
print(len(inner_table)

它应该打印该表中的播放器行数。我得到的输出是0而不是我预期的，17。

Answer 1

你得到父表div而不是表本身。仔细检查页面的HTML标记，您将找到该表的ID。

另请注意，该表使用tbody而不是立即列出行，因此您也必须考虑到这一点。

试图抓取特定的HTML进行网页搜索

问题描述投票：0回答：1

1个回答

最新问题

试图抓取特定的HTML进行网页搜索

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1