Pandas Groupby 返回增量组数作为新列

问题描述 投票:0回答:2

原DF

| performance | driver_id | season |
|-------------|-----------|--------|
| 1           | 1         | 17     |
| 2           | 1         | 17     |
| 3           | 2         | 17     |
| 4           | 2         | 18     |
| 5           | 2         | 18     |
| 6           | 2         | 19     |
| 7           | 1         | 17     |
| 8           | 1         | 18     |
| 9           | 1         | 18     |

所需的DF

| performance | driver_id | season | season_order |
|-------------|-----------|--------|--------------|
| 1           | 1         | 17     | 1            |
| 2           | 1         | 17     | 1            |
| 3           | 2         | 17     | 1            |
| 4           | 2         | 18     | 1            |
| 5           | 2         | 18     | 1            |
| 6           | 2         | 19     | 2            |
| 7           | 1         | 17     | 1            |
| 8           | 1         | 18     | 2            |
| 9           | 1         | 18     | 2            |

我有一个 DF,我想将车手的表现设置为第一、第二、第三赛季等。但这对于给定车手来说必须有所不同。

我尝试过使用

cumsum
rank
但仅在我想要的组内排序/排名之间提供帮助。从我的实验来看,这些方法似乎对于单列分组更有效。

df["season_number"] = (df
                     .groupby(["driver_id ", "season"])
                     .season
                     .transform("rank")
)
python pandas group-by
2个回答
0
投票

IIUC用途:

df["season_number"] = df.groupby(["driver_id"]).season.rank(method='dense').astype(int)
print (df)
   performance  driver_id  season  season_number
0            1          1      17              1
1            2          1      17              1
2            3          2      17              1
3            4          2      18              2
4            5          2      18              2
5            6          2      19              3
6            7          1      17              1
7            8          1      18              2
8            9          1      18              2

0
投票

您应该仅按

driver_id
分组并使用密集排名:

df['season_number'] = (
    df.groupby('driver_id')['season']
    .rank(method='dense')
    .astype(int)
)

输出:

   performance  driver_id  season  season_number
0            1          1      17              1
1            2          1      17              1
2            3          2      17              1
3            4          2      18              2
4            5          2      18              2
5            6          2      19              3
6            7          1      17              1
7            8          1      18              2
8            9          1      18              2
© www.soinside.com 2019 - 2024. All rights reserved.