从维基百科抓取时,我收到“ValueError:无法设置列不匹配的行错误”。见下文。我该如何解决这个问题?
from bs4 import BeautifulSoup
import pandas as pd
import requests
url = 'https://en.wikipedia.org/wiki/List_of_largest_companies_by_revenue'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html')
table = soup.find_all('table')[0]
soup.find('tr')
world_companies = soup.find('tr')
df = pd.DataFrame(columns = world_table_companies)
df
table.find_all('tr')
column_data = table.find_all('tr')
for row in column_data[2:]:
row_data = row.find_all('td')
individual_row_data = [data.text.strip() for data in row_data]
length = len(df)
df.loc[length] = individual_row_data
ValueError: cannot set a row with mismatched columns
你不需要漂亮的汤来毁掉一张有熊猫的桌子:
import pandas as pd
table_MN = pd.read_html('https://en.wikipedia.org/wiki/List_of_largest_companies_by_revenue')
for df in table_MN:
if "Rank" in df.columns:
print(df.to_string(index=False))