我使用的是pandas版本1.0.5
import pandas as pd
dat1 = [
['2023-12-27','2023-12-27 00:00:00','2023-12-27 02:14:00'],
['2023-12-27','2023-12-27 03:16:00','2023-12-27 04:19:00'],
['2023-12-27','2023-12-27 18:11:00','2023-12-27 20:13:00'],
['2023-12-28','2023-12-28 01:16:00','2023-12-28 02:14:00'],
['2023-12-28','2023-12-28 02:16:00','2023-12-28 02:28:00'],
['2023-12-28','2023-12-28 02:30:00','2023-12-28 02:56:00'],
['2023-12-28','2023-12-28 18:45:00','2023-12-28 19:00:00'],
['2023-12-29','2023-12-29 01:16:00','2023-12-29 02:13:00'],
['2023-12-29','2023-12-29 04:16:00','2023-12-29 05:09:00'],
['2023-12-29','2023-12-29 05:11:00','2023-12-29 05:14:00'],
['2023-12-29','2023-12-29 18:00:00','2023-12-29 19:00:00']
]
df = pd.DataFrame(dat1,columns = ['date','Start_tmp','End_tmp'])
df["Start_tmp"] = pd.to_datetime(df["Start_tmp"])
df["End_tmp"] = pd.to_datetime(df["End_tmp"])
我的数据框如下所示:
我需要找到时间戳之间的共同或重叠间隔。
例如, 所有三个日期(黄色突出显示)的重叠时间之一是 1:16 - 2:13。另一个(蓝色突出显示)是 18:45 - 19:00
所以我的预期输出是这样的:
[57,15]
57 - 1:16 - 2:13 之间的分钟。
15 - 18:45 - 19:00 之间的分钟
任何关于如何实现此输出的线索。 谢谢。
由于您只对时间感兴趣,因此我会将
datetime
转换为 time
并使用元组作为开始和结束以及当前间隔是否已合并:
(start_time: datetime.time, end_time: datetime.time, already_merged: Boolean)
排序后,我们可以循环查看两个连续间隔是否重叠。如果是这样,我们将只取两端的最大值和两端的最小值,并跟踪这个间隔。
intervals = [(x.time(), y.time()) for x, y in zip(df["Start_tmp"], df["End_tmp"])]
intervals = sorted(intervals)
def time_to_minutes(t):
return t.hour * 60 + t.minute
result = []
cur = (intervals[0][0], intervals[0][1], False)
for i in range(1, len(intervals)):
# Is the current interval overlapping with the iterated one?
if intervals[i][0] <= cur[1]:
cur = (max(cur[0], intervals[i][0]), min(cur[1], intervals[i][1]), True)
else:
if cur[2]:
result.append(time_to_minutes(cur[1]) - time_to_minutes(cur[0]))
cur = (intervals[i][0], intervals[i][1], False)
if cur[2]:
result.append(time_to_minutes(cur[1]) - time_to_minutes(cur[0]))
print(f"result = {result}") # [57, 3, 15]