我有这样的数据框
data = {
"timeStamp": [06:00:00, 06:03:00, 06:10:00, 06:30:00, 06:32:00, 06:02:00, 06:05:00, 06:06:00, 06:55:00, 06:00:00, 06:01:00, 06:20:00, 07:00:00],
"Event": [A, A, A, A, A, B, B, B, B, C, C, C, D]
}
df = pd.DataFrame(data)
我需要知道每组包含 3 行或更多行的最短间隔。
在给定的示例中我们看到:
所需的输出看起来像
活动 | 间隔 |
---|---|
A | 00:10:00 |
B | 00:04:00 |
C | 00:20:00 |
D | 不适用 |
...
等等
有什么优雅的方法可以做到这一点吗?
您可以
groupby("Event")
,然后应用自定义聚合函数。
# Convert to datetime to compute intervals
df['timeStamp'] = pd.to_datetime(df['timeStamp'])
def find_shortest_interval_3(group):
if len(group) < 3:
return None
group = group.sort_values('timeStamp')
min_interval = pd.Timedelta.max
for i in range(len(group) - 2): # You can parameterize this so that its not always 3
current_interval = group.iloc[i+2]['timeStamp'] - group.iloc[i]['timeStamp']
if current_interval < min_interval:
min_interval = current_interval
return min_interval
print(df.groupby('Event').apply(find_shortest_interval_3))