我已经尝试了几种不同的方法来产生相同的结果。我正在尝试从网上抓取一个表格并导出到 .csv,让这个在其他网站上工作,但由于某种原因在这里运气不佳。 网站
这是我的尝试之一:
import requests
import pandas as pd
url = 'https://www.oaklandyard.com/lg_standings/lg_standings.asp?LgSessCode=2731&ReturnPg=lg%5Fsoccer%5Fcoed%2Easp%232731&ShowRankings=False&HeaderTitle=&sw=1800'
response = requests.get(url)
html = response.text
tables = pd.read_html(html)
tables[0].to_csv(test.csv, index=False)
我试过使用 pandas 来抓取这个网站 print(html) 返回页面源代码,但是 pandas 没有看到表格。
您要抓取的页面中没有
<table>
标签。您必须使用 requests
和 bs4
手动处理页面:
import requests
import bs4
import pandas as pd
url = 'https://www.oaklandyard.com/lg_standings/lg_standings.asp?LgSessCode=2731&ReturnPg=lg%5Fsoccer%5Fcoed%2Easp%232731&ShowRankings=False&HeaderTitle=&sw=1800'
response = requests.get(url)
soup = bs4.BeautifulSoup(response.text)
table = soup.find_all('div', {'class': 'divLargeTable'})[1]
rows = table.find_all('div', {'class': 'divMultipleColumns'})
headers = [title.text for title in rows[0].find_all('div', {'class': 'standingsTitle1'})]
data = {}
for row in rows[1:]:
cols = row.find_all('div')
index = cols[0].text.strip()
data[index] = [int(value.text) for value in cols[1:]]
df = pd.DataFrame.from_dict(data, orient='index', columns=headers)
输出:
>>> df
Wins Losses Ties Scored Allowed Differential Points
Old farts 7 0 0 41 20 21 21
Killer Penguins 5 2 0 36 21 15 15
#BackHeelz 4 1 2 30 17 15 14
Misfits 3 2 2 25 19 6 11
The Banshees 3 4 0 32 37 -5 9
Other Team Money Line 1 4 2 15 25 -10 5
Slackers 1 5 1 18 27 -9 4
Shits & Dribbles 0 6 1 13 44 -31 1