我有一个如下的时间序列:
mode timestamp
0 A 2021-06-29 00:00:00-04:00
1 A 2021-06-29 01:00:00-04:00
2 A 2021-06-29 02:00:00-04:00
3 A 2021-06-29 03:00:00-04:00
4 B 2021-06-29 04:00:00-04:00
5 B 2021-06-29 05:00:00-04:00
6 B 2021-06-29 06:00:00-04:00
7 B 2021-06-29 07:00:00-04:00
8 B 2021-06-29 08:00:00-04:00
9 B 2021-06-29 09:00:00-04:00
10 B 2021-06-29 10:00:00-04:00
11 A 2021-06-29 11:00:00-04:00
12 A 2021-06-29 12:00:00-04:00
13 A 2021-06-29 13:00:00-04:00
14 A 2021-06-29 14:00:00-04:00
15 A 2021-06-29 15:00:00-04:00
16 A 2021-06-29 16:00:00-04:00
17 A 2021-06-29 17:00:00-04:00
18 A 2021-06-29 18:00:00-04:00
19 A 2021-06-29 19:00:00-04:00
20 C 2021-06-29 20:00:00-04:00
21 C 2021-06-29 21:00:00-04:00
22 C 2021-06-29 22:00:00-04:00
23 C 2021-06-29 23:00:00-04:00
24 C 2021-06-29 00:00:00-04:00
25 C 2021-06-29 01:00:00-04:00
26 C 2021-06-29 02:00:00-04:00
27 C 2021-06-29 03:00:00-04:00
我正在尝试生成一个字典列表,其中包含带有开始和结束时间的模式间隔。像这样的东西
[
{"mode": "A", start_time: "2021-06-29 00:00:00-04:00", end_time:"2021-06-29 03:00:00-04:00" },
{"mode": "B", start_time: "2021-06-29 04:00:00-04:00", end_time:"2021-06-29 10:00:00-04:00" },
{"mode": "A", start_time: "2021-06-29 11:00:00-04:00", end_time:"2021-06-29 19:00:00-04:00" },
{"mode": "C", start_time: "2021-06-29 20:00:00-04:00", end_time:"2021-06-29 03:00:00-04:00" },
]
时间序列非常长,循环起来似乎非常慢。关于如何实现这一目标有什么建议吗?
groupby.agg
和 to_dict
:
out = (df.groupby('mode', as_index=False)
.agg(**{'start_time': ('timestamp', 'first'),
'end_time': ('timestamp', 'last')})
.to_dict('records')
)
输出:
[{'mode': 'A', 'start_time': '2021-06-29 00:00:00-04:00', 'end_time': '2021-06-29 19:00:00-04:00'},
{'mode': 'B', 'start_time': '2021-06-29 04:00:00-04:00', 'end_time': '2021-06-29 10:00:00-04:00'},
{'mode': 'C', 'start_time': '2021-06-29 20:00:00-04:00', 'end_time': '2021-06-29 03:00:00-04:00'}]