pandas 模块无法找到表格,尽管它出现在网页上

问题描述 投票:0回答:1

这是我来自 jupyter 实验室的代码。我试图让 pandas 以与高级传递相同的方式输出“高级接收”表,但我不断收到此错误“ValueError:找不到与模式“高级接收”匹配的表”。这是包含所有表格的页面https://www.pro-football-reference.com/teams/buf/2023_advanced.htm

import requests
from bs4 import BeautifulSoup


standings_url = "https://www.pro-football-reference.com/years/2023/index.htm"



data = requests.get(standings_url)
soup = BeautifulSoup(data.text)

standings_table1 = soup.select('table.stats_table')[0]
standings_table2 = soup.select('table.stats_table')[1]
links1 = standings_table1.find_all('a')
links2 = standings_table2.find_all('a')
all_links = [l.get('href') for l in links1] + [l.get('href') for l in links2]
team_links = [l for l in all_links if '/teams/' in l]

base_url = "https://www.pro-football-reference.com"
full_team_links = [base_url + l for l in all_links]
full_team_links

team_url = full_team_links[0]
data = requests.get(team_url)
data.text

import pandas as pd
games = pd.read_html(data.text, match="Schedule & Game Results")
games[0]

soup = BeautifulSoup(data.text, 'html5lib')
links = soup.find_all('a')
links =[l.get("href") for l in links]
links = [l for l in links if l and '/2023_advanced.htm' in l]
links

data = requests.get(f"https://www.pro-football-reference.com{links[0]}")
data.text

passing = pd.read_html(data.text, match="Advanced Passing")[0]
receiving = pd.read_html(data.text, match="Advanced Receiving")[0]
passing.head()
receiving.head()

检查是否有动态内容加载,检查是否有不正确的表格逻辑

python web-scraping jupyter-lab
1个回答
0
投票

评论中的表格,所以你可以这样做(我还没有测试过,但它应该可以工作。):

import requests
from bs4 import BeautifulSoup, Comment
import pandas as pd


r = requests.get('https://www.pro-football-reference.com/teams/buf/2023_advanced.htm')
soup = BeautifulSoup(r.text, 'lxml')
for comment in soup.findAll(string=lambda text: isinstance(text, Comment)):
    table = BeautifulSoup(comment, 'lxml').find('table', {'id': 'advanced_receiving'})
    if table:
        df = pd.read_html(str(table))
        print(df)
© www.soinside.com 2019 - 2024. All rights reserved.