我有一个以日期时间(以毫秒为单位)作为索引和价格列的数据框。我想创建一个新列,其价格与前 20 行最接近。例如:
Data Hora
2024-02-01 10:03:39.483 1.880
2024-02-01 10:03:40.540 1.900
2024-02-01 10:03:41.550 1.880
2024-02-01 10:03:43.563 1.890
2024-02-01 10:03:45.567 1.870
2024-02-01 10:03:45.583 1.890
2024-02-01 10:03:46.590 1.900
2024-02-01 10:03:48.620 1.930
2024-02-01 10:03:50.627 1.880
2024-02-01 10:03:51.630 1.890
2024-02-01 10:03:53.647 1.880
2024-02-01 10:03:55.753 1.900
2024-02-01 10:03:59.367 1.890
2024-02-01 10:04:02.497 1.910
2024-02-01 10:04:04.543 1.890
2024-02-01 10:04:05.550 1.860
2024-02-01 10:04:07.577 1.840
2024-02-01 10:04:08.157 1.850
2024-02-01 10:04:10.197 1.880
2024-02-01 10:04:11.887 1.910
2024-02-01 10:04:13.163 1.920
rolling.apply
:
df['Data Hora'] = pd.to_datetime(df['Data Hora'])
def closest(s):
if len(s)>1:
diff = s.iloc[:-1].sub(s.iloc[-1]).abs()
return s.loc[diff.idxmin()]
else:
return float('nan')
df['closest_price_20s'] = df.rolling('20s', on='Data Hora')['Price'].apply(closest)
输出:
Data Hora Price closest_price_20s
0 2024-02-01 10:03:39.483 1.88 NaN
1 2024-02-01 10:03:40.540 1.90 1.88
2 2024-02-01 10:03:41.550 1.88 1.88
3 2024-02-01 10:03:43.563 1.89 1.88
4 2024-02-01 10:03:45.567 1.87 1.88
5 2024-02-01 10:03:45.583 1.89 1.89
6 2024-02-01 10:03:46.590 1.90 1.90
7 2024-02-01 10:03:48.620 1.93 1.90
8 2024-02-01 10:03:50.627 1.88 1.88
9 2024-02-01 10:03:51.630 1.89 1.89
10 2024-02-01 10:03:53.647 1.88 1.88
11 2024-02-01 10:03:55.753 1.90 1.90
12 2024-02-01 10:03:59.367 1.89 1.89
13 2024-02-01 10:04:02.497 1.91 1.90
14 2024-02-01 10:04:04.543 1.89 1.89
15 2024-02-01 10:04:05.550 1.86 1.87
16 2024-02-01 10:04:07.577 1.84 1.86
17 2024-02-01 10:04:08.157 1.85 1.86
18 2024-02-01 10:04:10.197 1.88 1.88
19 2024-02-01 10:04:11.887 1.91 1.91
20 2024-02-01 10:04:13.163 1.92 1.91
图形输出: