Pandas 对股票重新采样 5 分钟数据未对齐

问题描述 投票:0回答:1

我有一些库存 5 分钟数据,例如:

                  Date   Open   High    Low  Close    Volume
0  2024-11-19 09:35:00  11.75  11.79  11.55  11.78  32673600
1  2024-11-19 09:40:00  11.78  11.81  11.73  11.79  14802700
2  2024-11-19 09:45:00  11.79  11.84  11.79  11.82  13837400
3  2024-11-19 09:50:00  11.81  11.83  11.76  11.82   8534200
4  2024-11-19 09:55:00  11.82  11.87  11.80  11.87   8540500
5  2024-11-19 10:00:00  11.87  11.96  11.87  11.90  20659800
6  2024-11-19 10:05:00  11.89  11.90  11.82  11.82  11691000
7  2024-11-19 10:10:00  11.82  11.82  11.73  11.74   8762900
8  2024-11-19 10:15:00  11.74  11.74  11.71  11.73   6870500
9  2024-11-19 10:20:00  11.73  11.73  11.68  11.70   6244800
10 2024-11-19 10:25:00  11.70  11.70  11.66  11.69   5083000
11 2024-11-19 10:30:00  11.70  11.73  11.69  11.71   5342400
12 2024-11-19 10:35:00  11.72  11.74  11.71  11.73   3311800
13 2024-11-19 10:40:00  11.73  11.74  11.71  11.72   2331900
14 2024-11-19 10:45:00  11.72  11.72  11.70  11.72   3024100
15 2024-11-19 10:50:00  11.71  11.74  11.70  11.71   2774200
16 2024-11-19 10:55:00  11.70  11.72  11.70  11.71   1313000
17 2024-11-19 11:00:00  11.72  11.75  11.71  11.74   1737400
18 2024-11-19 11:05:00  11.75  11.75  11.73  11.75   1690600
19 2024-11-19 11:10:00  11.74  11.76  11.73  11.76   1751800
20 2024-11-19 11:15:00  11.76  11.76  11.72  11.73   2248700
21 2024-11-19 11:20:00  11.73  11.73  11.70  11.71   2464200
22 2024-11-19 11:25:00  11.71  11.71  11.69  11.70   1033600
23 2024-11-19 11:30:00  11.69  11.70  11.67  11.69   2063600

我使用df.resample将它们转换为30m数据,代码为:

df = df.set_index('Date')
df = df.resample('30T').agg({'Open':'first', 'High':'max', 'Low':'min','Close':'last',
                                 'Volume':'sum'}, closed='right', label = 'right').dropna()

但是我得到了这样奇怪的结果:

                      Open   High    Low  Close    Volume
Date                                                     
2024-11-19 09:30:00  11.75  11.87  11.55  11.87  78388400
2024-11-19 10:00:00  11.87  11.96  11.66  11.69  59312000
2024-11-19 10:30:00  11.70  11.74  11.69  11.71  18097400
2024-11-19 11:00:00  11.72  11.76  11.69  11.70  10926300
2024-11-19 11:30:00  11.69  11.70  11.67  11.69   2063600

以下是我的交易软件导出的正确30m数据:

Time    Open    High    Low Close   Volume
 2024/11/19-10:00   11.75   11.96   11.55   11.9    99048200
 2024/11/19-10:30   11.89   11.9    11.66   11.71   43994600
 2024/11/19-11:00   11.72   11.75   11.7    11.74   14492400
 2024/11/19-11:30   11.75   11.76   11.67   11.69   11252500
 

9点30分的数据无关紧要,主要是以下数据不正确。但我没有找到df.sample的更多参数。如何正确聚合数据?

pandas dataframe resample
1个回答
0
投票

默认情况下,参考是一天的开始。看来您想要数据的开始:

(df.resample('30min', origin='start')
   .agg({'Open':'first', 'High':'max', 'Low':'min','Close':'last',
         'Volume':'sum'}, closed='right', label = 'right')
 .dropna()
)
© www.soinside.com 2019 - 2024. All rights reserved.