我在 pandas 中有很多类似的行,如下所示:
日期 | 位置 |
---|---|
2023-08-01 12:01:00 | A23 |
2023-08-01 12:20:00 | A23 |
2023-08-01 13:10:10 | A23 |
2023-08-02 12:00:00 | B12 |
2023-08-02 12:01:00 | A23 |
2023-08-02 12:05:00 | A23 |
我需要按“位置”聚合值并合并日期时间范围,如下所示:
日期 | 日期2 | 位置 |
---|---|---|
2023-08-01 12:01:00 | 2023-08-01 13:10:10 | A23 |
2023-08-02 12:00:00 | NaN | B12 |
2023-08-02 12:01:00 | 2023-08-02 12:05:00 | A23 |
谢谢你
groupby.agg
进行后处理:
# ensure datetime
df['Date'] = pd.to_datetime(df['Date'])
# group successive positions
group = df['Position'].ne(df['Position'].shift()).cumsum()
out = (df
.groupby(group, as_index=False)
.agg(Date=('Date', 'min'),
Date2=('Date', 'max'),
Position=('Position', 'first'),
n=('Position', 'count')
)
# hide Date2 if there was not more than 1 item in the group
# you could also check that Date ≠ Date2
.assign(Date2=lambda d: d['Date2'].where(d.pop('n').gt(1)))
)
注意。要按位置和日期分组,请使用
.groupby(['Position', df['Date'].dt.normalize()], as_index=False)
。
输出:
Date Date2 Position
0 2023-08-01 12:01:00 2023-08-01 13:10:10 A23
1 2023-08-02 12:00:00 NaT B12
2 2023-08-02 12:01:00 2023-08-02 12:05:00 A23