用 pd.read_html 抓取,返回 ValueError: No tables found

问题描述 投票:0回答:1

我已经尝试了几种不同的方法来产生相同的结果。我正在尝试从网上抓取一个表格并导出到 .csv,让这个在其他网站上工作,但由于某种原因在这里运气不佳。 网站

这是我的尝试之一:

import requests
import pandas as pd

url = 'https://www.oaklandyard.com/lg_standings/lg_standings.asp?LgSessCode=2731&ReturnPg=lg%5Fsoccer%5Fcoed%2Easp%232731&ShowRankings=False&HeaderTitle=&sw=1800'
response = requests.get(url)
html = response.text
tables = pd.read_html(html)
tables[0].to_csv(test.csv, index=False)

我试过使用 pandas 来抓取这个网站 print(html) 返回页面源代码,但是 pandas 没有看到表格。

pandas web-scraping python-requests lxml
1个回答
1
投票

您要抓取的页面中没有

<table>
标签。您必须使用
requests
bs4
手动处理页面:

import requests
import bs4
import pandas as pd

url = 'https://www.oaklandyard.com/lg_standings/lg_standings.asp?LgSessCode=2731&ReturnPg=lg%5Fsoccer%5Fcoed%2Easp%232731&ShowRankings=False&HeaderTitle=&sw=1800'

response = requests.get(url)
soup = bs4.BeautifulSoup(response.text)
table = soup.find_all('div', {'class': 'divLargeTable'})[1]
rows = table.find_all('div', {'class': 'divMultipleColumns'})
headers = [title.text for title in rows[0].find_all('div', {'class': 'standingsTitle1'})]

data = {}
for row in rows[1:]:
    cols = row.find_all('div')
    index = cols[0].text.strip()
    data[index] = [int(value.text) for value in cols[1:]]
    
df = pd.DataFrame.from_dict(data, orient='index', columns=headers)

输出:

>>> df
                       Wins  Losses  Ties  Scored  Allowed  Differential  Points
Old farts                 7       0     0      41       20            21      21
Killer Penguins           5       2     0      36       21            15      15
#BackHeelz                4       1     2      30       17            15      14
Misfits                   3       2     2      25       19             6      11
The Banshees              3       4     0      32       37            -5       9
Other Team Money Line     1       4     2      15       25           -10       5
Slackers                  1       5     1      18       27            -9       4
Shits & Dribbles          0       6     1      13       44           -31       1
© www.soinside.com 2019 - 2024. All rights reserved.