pandas 数据框:
data = pd.DataFrame ({
'group': ['A', 'A', 'B', 'B', 'C', 'C'],
'date': ['2023-01-15', '2023-02-20', '2023-01-10', '2023-03-05', '2023-02-01', '2023-04-10'],
'value': [10, 15, 5, 25, 8, 12]} )
尝试根据聚合函数中每个“组”的“日期”列的最小值和最大值获取“值”列的值:
## the following doesn't work
output = (
df
.groupby(['group'],as_index=False).agg(
## there are some other additional aggregate functions happening here too.
value_at_min = ('value' , lambda x: x.loc[x['date'].idxmin()])
, value_at_max = ('value' , lambda x: x.loc[x['date'].idxmax()])
))
即使将日期转换为日期时间,这也不起作用(事实上,我的原始日期列是日期时间格式)。
期望的输出应该是:
group min_date max_date value_at_min value_at_max
0 A 2023-01-15 2023-02-20 10 15
1 B 2023-01-10 2023-03-05 5 25
2 C 2023-02-01 2023-04-10 8 12
我宁愿获取 idxmin/max,然后对原始 DataFrame 进行切片:
tmp = data.groupby('group')['value'].agg(['idxmin', 'idxmax'])
out = (data.loc[tmp['idxmin']]
.merge(data.loc[tmp['idxmax']],
on='group', suffixes=('_min', '_max'))
)
输出:
group date_min value_min date_max value_max
0 A 2023-01-15 10 2023-02-20 15
1 B 2023-01-10 5 2023-03-05 25
2 C 2023-02-01 8 2023-04-10 12