Pandas - 从每个组的最大日期减去最小日期

问题描述 投票:4回答:1

我想添加一个列,该列是从每个customer_id的最大日期减去此表的最小日期的结果

输入:

action_date customer_id
 2017-08-15       1
 2017-08-21       1
 2017-08-21       1
 2017-09-02       1
 2017-08-28       2
 2017-09-29       2
 2017-10-15       3   
 2017-10-30       3
 2017-12-05       3

得到这张桌子

输出:

action_date customer_id    diff
 2017-08-15       1         18
 2017-08-21       1         18
 2017-08-21       1         18
 2017-09-02       1         18
 2017-08-28       2         32
 2017-09-29       2         32
 2017-10-15       3         51
 2017-10-30       3         51
 2017-12-05       3         51

我尝试了这个代码,但它放了很多NaN

group = df.groupby(by='customer_id')
df['diff'] = (group['action_date'].max() - group['action_date'].min()).dt.days
python pandas group-by
1个回答
8
投票

你可以使用transform方法:

In [23]: df['diff'] = df.groupby('customer_id') \
                        ['action_date'] \
                        .transform(lambda x: (x.max()-x.min()).days)

In [24]: df
Out[24]:
  action_date  customer_id  diff
0  2017-08-15            1    18
1  2017-08-21            1    18
2  2017-08-21            1    18
3  2017-09-02            1    18
4  2017-08-28            2    32
5  2017-09-29            2    32
6  2017-10-15            3    51
7  2017-10-30            3    51
8  2017-12-05            3    51
© www.soinside.com 2019 - 2024. All rights reserved.