我正在尝试创建一个过程,通过可编辑的 pdf 上传某些可编辑的 pdf 数据解析,然后将数据排列在 df 中。我已经设法读取可编辑 pdf 中的数据并将它们放入 df 中,但我遇到了一个问题,其中个人的姓名在 df 中显示为列,而不是每个玩家的单独行,目前的行为每个玩家都以 name_# 的形式出现 ,位置_#,国家_#..
这不是我使用的实际数据,只是重新创建我面临的场景。
df = pd.DataFrame({'Team': ["Bayern", "Barcelona", "Madrid"],
'region': ["Bravaria","Barcelona", "Madrid"],
'title': ["Bundesliga","Laliga", "Champions Leauge"],
'name_1': ["Robben","Messi", "Ronaldo"],
'Position_1': ["RW","ST", "ST"],
'Country_1': ["Netherlands","Argentina", "Portugal"],
'name_2': ["Ribery","Neymar", "Benzema"],
'Position_2': ["LW","LW", "RW"],
'Country_2': ["FRANCE","Brazil", "France"]})
df
我正在尝试找到一种方法来重新定位 DF,以便它可以看起来下面共享的 df: 每个团队都有不同的 pdf,但结构都是相同的。任何想法都会有帮助。谢谢
df1 = pd.DataFrame({'Team': ["Bayern", "Barcelona", "Madrid", "Barcelona", "Madrid","Bayern"],
'region': ["Bravaria","Barcelona", "Madrid", "Barcelona", "Madrid","Bravaria"],
'title': ["Bundesliga","Laliga", "Champions Leauge","Laliga", "Champions Leauge","Bundesliga"],
'name': ["Robben","Messi", "Ronaldo","Neymar","Benzema","Ribery"],
'Position': ["RW","ST", "ST", "ST","RW","LW"],
'Country': ["Netherlands","Argentina", "Portugal","Brazil","France","France"]
}
)
df1
尝试:
df["name"] = df.filter(regex=r"name_\d+").agg(list, axis=1)
df["country"] = df.filter(regex=r"Country_\d+").agg(list, axis=1)
df["position"] = df.filter(regex=r"Position_\d+").agg(list, axis=1)
print(
df.explode(["name", "country", "position"])[
["Team", "region", "title", "name", "position", "country"]
]
)
打印:
Team region title name position country
0 Bayern Bravaria Bundesliga Robben RW Netherlands
0 Bayern Bravaria Bundesliga Ribery LW FRANCE
1 Barcelona Barcelona Laliga Messi ST Argentina
1 Barcelona Barcelona Laliga Neymar LW Brazil
2 Madrid Madrid Champions Leauge Ronaldo ST Portugal
2 Madrid Madrid Champions Leauge Benzema RW France