原DF
| performance | driver_id | season |
|-------------|-----------|--------|
| 1 | 1 | 17 |
| 2 | 1 | 17 |
| 3 | 2 | 17 |
| 4 | 2 | 18 |
| 5 | 2 | 18 |
| 6 | 2 | 19 |
| 7 | 1 | 17 |
| 8 | 1 | 18 |
| 9 | 1 | 18 |
所需的DF
| performance | driver_id | season | season_order |
|-------------|-----------|--------|--------------|
| 1 | 1 | 17 | 1 |
| 2 | 1 | 17 | 1 |
| 3 | 2 | 17 | 1 |
| 4 | 2 | 18 | 1 |
| 5 | 2 | 18 | 1 |
| 6 | 2 | 19 | 2 |
| 7 | 1 | 17 | 1 |
| 8 | 1 | 18 | 2 |
| 9 | 1 | 18 | 2 |
我有一个 DF,我想将车手的表现设置为第一、第二、第三赛季等。但这对于给定车手来说必须有所不同。
我尝试过使用
cumsum
和 rank
但仅在我想要的组内排序/排名之间提供帮助。从我的实验来看,这些方法似乎对于单列分组更有效。
df["season_number"] = (df
.groupby(["driver_id ", "season"])
.season
.transform("rank")
)
IIUC用途:
df["season_number"] = df.groupby(["driver_id"]).season.rank(method='dense').astype(int)
print (df)
performance driver_id season season_number
0 1 1 17 1
1 2 1 17 1
2 3 2 17 1
3 4 2 18 2
4 5 2 18 2
5 6 2 19 3
6 7 1 17 1
7 8 1 18 2
8 9 1 18 2
您应该仅按
driver_id
分组并使用密集排名:
df['season_number'] = (
df.groupby('driver_id')['season']
.rank(method='dense')
.astype(int)
)
输出:
performance driver_id season season_number
0 1 1 17 1
1 2 1 17 1
2 3 2 17 1
3 4 2 18 2
4 5 2 18 2
5 6 2 19 3
6 7 1 17 1
7 8 1 18 2
8 9 1 18 2