我有一个类似于以下内容的网球数据集:
import pandas as pd
player_name = ['Novak Djokovic','Rafael Nadal','Roger Federer','Andy Murray']
match_id = [202, 202, 203, 203]
score = ['6-3 5-7 6-4', '6-3 5-7 6-4', '6-4 7-6', '6-4 7-6']
outcome = [1,0,1,0]
df = pd.DataFrame(
{'player_name': player_name,
'match_id': match_id,
'score': score,
'outcome':outcome
})
df["sets"]= df["score"].str.split(" ", n = 3, expand = False)
df['set1_score'] = df['sets'].str[0]
df['set2_score'] = df['sets'].str[1]
df['set3_score'] = df['sets'].str[2]
结果变量的值为 1 表示获胜,值为 0 表示失败。目前,分数是从获胜者的角度书写的。我希望能够切换以 0 值为条件的 set1_score、set2_score、set3_score 列的顺序(即:对于比赛的失败者)。
所以,我希望数据框看起来像这样:
player_name match_id score outcome sets set1_score set2_score set3_score
Novak Djokovic 202 6-3 5-7 6-4 1 [6-3, 5-7, 6-4] 6-3 5-7 6-4
Rafel Nadal 202 6-3 5-7 6-4 0 [6-3, 5-7, 6-4] 3-6 7-5 4-6
Roger Federer 203 6-4 7-6 1 [6-4, 7-6] 6-4 7-6 NaN
Andy Murray 203 6-4 7-6 0 [6-4, 7-6] 4-6 6-7 NaN
我应该如何进行这个过程?
谢谢!
要根据您所期望的结果反转 pandas DataFrame 中的分数顺序,您可以定义一个函数,在结果为 0(损失)时反转每个分数。然后可以使用 apply 方法和 axis=1 将该函数应用于每一行,以对行而不是列进行操作。尝试使用下面的代码
import pandas as pd
# Your original DataFrame
player_name = ['Novak Djokovic', 'Rafael Nadal', 'Roger Federer', 'Andy Murray']
match_id = [202, 202, 203, 203]
score = ['6-3 5-7 6-4', '6-3 5-7 6-4', '6-4 7-6', '6-4 7-6']
outcome = [1, 0, 1, 0]
df = pd.DataFrame({
'player_name': player_name,
'match_id': match_id,
'score': score,
'outcome': outcome
})
# Function to reverse scores
def reverse_score(row):
if row['outcome'] == 0:
new_scores = []
for set_score in row['score'].split():
a, b = set_score.split('-')
new_scores.append(f"{b}-{a}")
return ' '.join(new_scores)
else:
return row['score']
# Apply the function to each row
df['score'] = df.apply(reverse_score, axis=1)
# Split the scores into individual sets
df['sets'] = df['score'].str.split(' ')
df['set1_score'] = df['sets'].str[0]
df['set2_score'] = df['sets'].str[1]
df['set3_score'] = df['sets'].str[2]
print(df)