我有一些库存 5 分钟数据,例如:
Date Open High Low Close Volume
0 2024-11-19 09:35:00 11.75 11.79 11.55 11.78 32673600
1 2024-11-19 09:40:00 11.78 11.81 11.73 11.79 14802700
2 2024-11-19 09:45:00 11.79 11.84 11.79 11.82 13837400
3 2024-11-19 09:50:00 11.81 11.83 11.76 11.82 8534200
4 2024-11-19 09:55:00 11.82 11.87 11.80 11.87 8540500
5 2024-11-19 10:00:00 11.87 11.96 11.87 11.90 20659800
6 2024-11-19 10:05:00 11.89 11.90 11.82 11.82 11691000
7 2024-11-19 10:10:00 11.82 11.82 11.73 11.74 8762900
8 2024-11-19 10:15:00 11.74 11.74 11.71 11.73 6870500
9 2024-11-19 10:20:00 11.73 11.73 11.68 11.70 6244800
10 2024-11-19 10:25:00 11.70 11.70 11.66 11.69 5083000
11 2024-11-19 10:30:00 11.70 11.73 11.69 11.71 5342400
12 2024-11-19 10:35:00 11.72 11.74 11.71 11.73 3311800
13 2024-11-19 10:40:00 11.73 11.74 11.71 11.72 2331900
14 2024-11-19 10:45:00 11.72 11.72 11.70 11.72 3024100
15 2024-11-19 10:50:00 11.71 11.74 11.70 11.71 2774200
16 2024-11-19 10:55:00 11.70 11.72 11.70 11.71 1313000
17 2024-11-19 11:00:00 11.72 11.75 11.71 11.74 1737400
18 2024-11-19 11:05:00 11.75 11.75 11.73 11.75 1690600
19 2024-11-19 11:10:00 11.74 11.76 11.73 11.76 1751800
20 2024-11-19 11:15:00 11.76 11.76 11.72 11.73 2248700
21 2024-11-19 11:20:00 11.73 11.73 11.70 11.71 2464200
22 2024-11-19 11:25:00 11.71 11.71 11.69 11.70 1033600
23 2024-11-19 11:30:00 11.69 11.70 11.67 11.69 2063600
我使用df.resample将它们转换为30m数据,代码为:
df = df.set_index('Date')
df = df.resample('30T').agg({'Open':'first', 'High':'max', 'Low':'min','Close':'last',
'Volume':'sum'}, closed='right', label = 'right').dropna()
但是我得到了这样奇怪的结果:
Open High Low Close Volume
Date
2024-11-19 09:30:00 11.75 11.87 11.55 11.87 78388400
2024-11-19 10:00:00 11.87 11.96 11.66 11.69 59312000
2024-11-19 10:30:00 11.70 11.74 11.69 11.71 18097400
2024-11-19 11:00:00 11.72 11.76 11.69 11.70 10926300
2024-11-19 11:30:00 11.69 11.70 11.67 11.69 2063600
以下是我的交易软件导出的正确30m数据:
Time Open High Low Close Volume
2024/11/19-10:00 11.75 11.96 11.55 11.9 99048200
2024/11/19-10:30 11.89 11.9 11.66 11.71 43994600
2024/11/19-11:00 11.72 11.75 11.7 11.74 14492400
2024/11/19-11:30 11.75 11.76 11.67 11.69 11252500
9点30分的数据无关紧要,主要是以下数据不正确。但我没有找到df.sample的更多参数。如何正确聚合数据?
默认情况下,参考是一天的开始。看来您想要数据的开始:
(df.resample('30min', origin='start')
.agg({'Open':'first', 'High':'max', 'Low':'min','Close':'last',
'Volume':'sum'}, closed='right', label = 'right')
.dropna()
)