这是我来自 jupyter 实验室的代码。我试图让 pandas 以与高级传递相同的方式输出“高级接收”表,但我不断收到此错误“ValueError:找不到与模式“高级接收”匹配的表”。这是包含所有表格的页面https://www.pro-football-reference.com/teams/buf/2023_advanced.htm
import requests
from bs4 import BeautifulSoup
standings_url = "https://www.pro-football-reference.com/years/2023/index.htm"
data = requests.get(standings_url)
soup = BeautifulSoup(data.text)
standings_table1 = soup.select('table.stats_table')[0]
standings_table2 = soup.select('table.stats_table')[1]
links1 = standings_table1.find_all('a')
links2 = standings_table2.find_all('a')
all_links = [l.get('href') for l in links1] + [l.get('href') for l in links2]
team_links = [l for l in all_links if '/teams/' in l]
base_url = "https://www.pro-football-reference.com"
full_team_links = [base_url + l for l in all_links]
full_team_links
team_url = full_team_links[0]
data = requests.get(team_url)
data.text
import pandas as pd
games = pd.read_html(data.text, match="Schedule & Game Results")
games[0]
soup = BeautifulSoup(data.text, 'html5lib')
links = soup.find_all('a')
links =[l.get("href") for l in links]
links = [l for l in links if l and '/2023_advanced.htm' in l]
links
data = requests.get(f"https://www.pro-football-reference.com{links[0]}")
data.text
passing = pd.read_html(data.text, match="Advanced Passing")[0]
receiving = pd.read_html(data.text, match="Advanced Receiving")[0]
passing.head()
receiving.head()
检查是否有动态内容加载,检查是否有不正确的表格逻辑
评论中的表格,所以你可以这样做(我还没有测试过,但它应该可以工作。):
import requests
from bs4 import BeautifulSoup, Comment
import pandas as pd
r = requests.get('https://www.pro-football-reference.com/teams/buf/2023_advanced.htm')
soup = BeautifulSoup(r.text, 'lxml')
for comment in soup.findAll(string=lambda text: isinstance(text, Comment)):
table = BeautifulSoup(comment, 'lxml').find('table', {'id': 'advanced_receiving'})
if table:
df = pd.read_html(str(table))
print(df)