在 Pandas 中创建 3 列数据框

问题描述 投票:0回答:1

我想抓取一个足球网站以在 Pandas 中创建数据集。我不知道如何将抓取到的球员数据信息输入到 3 列(姓名、联赛、足球队)中,并添加国家/地区以适合表格/数据框。

信息已被抓取,尽管不是很整齐,但我不确定(也不知道如何)我应该创建一个数组并将信息循环到列表或数组中。

from bs4 import BeautifulSoup

import requests

url = 'https://ng.soccerway.com/players/players_abroad/nigeria/'
req = requests.get(url,headers={'User-Agent':'Mozilla/5.0'})
page = req
soup = BeautifulSoup(page.text, 'html')
table = soup.find_all('table', class_="playersabroad table")
player_country = soup.find_all('th')
player_country_header = [country.text.strip() for country in player_country]

print(player_country_header)

import pandas as pd
import numpy as np

df = pd.DataFrame(columns = ['player-name', 'League', 'team_name'])
#df = pd.DataFrame(columns = player_country_header ) df

table_data = soup.find_all('td')
    player_data_list=[data.text.strip() for data in table_data]
    #length = len(df)
    #df.loc[length] = player_data_list
    print(player_data_list)
python pandas dataframe relational-database numpy-slicing
1个回答
0
投票

对于 ,这是一个带有 后处理

read_html
的提议:

cols = ["player-name", "League", "team_name"]

tmp = pd.read_html(requests.get(
    url, headers={"User-Agent": "Mozilla/5.0"}).content)[0]

df = (
    tmp.T.reset_index().T # to slip down the incorrect 'England' header
        .assign(country=lambda x: x.pop(3).str.split(".").str[0].ffill())
        .iloc[1:].loc[tmp.iloc[:, -1].isna()]
        .set_axis(cols + ["country"], axis=1)
)

输出:

print(df)

      player-name          League          team_name  country
0        A. Iwobi  Premier League             Fulham  England
1      T. Awoniyi  Premier League  Nottingham Forest  England
2         O. Aina  Premier League  Nottingham Forest  England
3       F. Onyeka  Premier League          Brentford  England
4       C. Bassey  Premier League             Fulham  England
...           ...             ...                ...      ...
1078   S. Danjuma   Yemeni League      Al Ahli San'a    Yemen
1079  M. Alhassan   Yemeni League    Yarmuk al Rawda    Yemen
1080     A. Nweze   Yemeni League    Yarmuk al Rawda    Yemen
1081  A. Olalekan   Yemeni League      Al Sha'ab Ibb    Yemen
1082     A. Adisa   Yemeni League          Al Urooba    Yemen

[975 rows x 4 columns]
© www.soinside.com 2019 - 2024. All rights reserved.