我未能找到交易研究的解决方案。
我有一个从 1993 年到 2024 年运行的大型市场数据集。
由于日历日与交易日不符。它可能按“工作日”实施,但谁知道假期/工作日是否与交易日匹配。如果满足某些条件,我需要获得每月的交易日。
Day Open High Low Close Volume trading_day
Date
2010-01-04 Monday 112.370003 113.389999 111.510002 113.330002 118944600 1
2010-01-05 Tuesday 113.260002 113.680000 112.849998 113.629997 111579900 2
2010-01-06 Wednesday 113.519997 113.989998 113.430000 113.709999 116074400 3
2010-01-07 Thursday 113.500000 114.330002 113.180000 114.190002 131091100 4
2010-01-08 Friday 113.889999 114.620003 113.660004 114.570000 126402800 5
2010-01-11 Monday 115.080002 115.129997 114.239998 114.730003 106375700 6
2010-01-12 Tuesday 113.970001 114.209999 113.220001 113.660004 163333500 7
2010-01-13 Wednesday 113.949997 114.940002 113.370003 114.620003 161822000 8
2010-01-14 Thursday 114.489998 115.139999 114.419998 114.930000 115718800 9
2010-01-15 Friday 114.730003 114.839996 113.199997 113.639999 212283100 10
2010-01-19 Tuesday 113.620003 115.129997 113.589996 115.059998 139172700 11
2010-01-20 Wednesday 114.279999 114.449997 112.980003 113.889999 216490200 12
2010-01-21 Thursday 113.919998 114.269997 111.559998 111.699997 344859600 13
2010-01-22 Friday 111.199997 111.739998 109.089996 109.209999 345942400 14
2010-01-25 Monday 110.209999 110.410004 109.410004 109.769997 186937500 15
2010-01-26 Tuesday 109.339996 110.470001 109.040001 109.309998 211168800 16
2010-01-27 Wednesday 109.169998 110.080002 108.330002 109.830002 271863600 17
2010-01-28 Thursday 110.190002 110.250000 107.910004 108.570000 316104000 18
2010-01-29 Friday 109.040001 109.800003 107.220001 107.389999 310677600 19
上面是我想要实现的一个例子。我通过使用以下代码选择一个月进行测试来实现这一点。
df_test['trading_day'] = range(1, len(df_test) + 1)
当然,自 1993 年以来已有大约 300 多个月,所以全部手动完成将是地狱。
我使用以下代码成功按月实现了
df.groupby
:
df_grouped_monthly = df.groupby(pd.Grouper(freq='M'))
并尝试应用与上面相同的计数器。遗憾的是,它不适用于分组 df。我试过.transform(add(1))
可能的解决方案是获取分组行位置 - 不知道行号是否从初始 df 继承,或者基于 groupby 进行计数。
或者将上面提到的计数器添加到分组 df 中。我未能向分组 df 添加额外的列。
有关如何通过 df 在整个数据集上基于每月实现上述目标的任何建议。
您可以将
'Date'
列转换为日期时间,并在分组之前为 'Year',
'Month' 和 'Day'
创建新列。
df["Year"] = pd.to_datetime(df.Date, format='%b', errors='coerce').dt.year
df["Month"] = pd.to_datetime(df.Date, format='%b', errors='coerce').dt.month
df["Day"] = pd.to_datetime(df.Date, format='%b', errors='coerce').dt.day
df = df.sort_values(by=["Year", "Month", "Day"])
输出日期帧:
Date Day Open High Low Close Volume trading_dayYearMonth Day
2010-01-14 Thursday 114.489998 115.139999 114.419998 114.93 115718800 9 2010 1 14
2010-01-11 Monday 115.080002 115.129997 114.239998 114.730003 106375700 6 2010 1 11
2010-01-08 Friday 113.889999 114.620003 113.660004 114.57 126402800 5 2010 1 8
2010-01-19 Tuesday 113.620003 115.129997 113.589996 115.059998 139172700 11 2010 1 19
2010-01-06 Wednesday 113.519997 113.989998 113.43 113.709999 116074400 3 2010 1 6
2010-01-13 Wednesday 113.949997 114.940002 113.370003 114.620003 161822000 8 2010 1 13
2010-01-12 Tuesday 113.970001 114.209999 113.220001 113.660004 163333500 7 2010 1 12
2010-01-15 Friday 114.730003 114.839996 113.199997 113.639999 212283100 10 2010 1 15
2010-01-07 Thursday 113.5 114.330002 113.18 114.190002 131091100 4 2010 1 7
2010-01-20 Wednesday 114.279999 114.449997 112.980003 113.889999 216490200 12 2010 1 20
2010-01-05 Tuesday 113.260002 113.68 112.849998 113.629997 111579900 2 2010 1 5
2010-01-21 Thursday 113.919998 114.269997 111.559998 111.699997 344859600 13 2010 1 21
2010-01-04 Monday 112.370003 113.389999 111.510002 113.330002 118944600 1 2010 1 4
2010-01-25 Monday 110.209999 110.410004 109.410004 109.769997 186937500 15 2010 1 25
2010-01-22 Friday 111.199997 111.739998 109.089996 109.209999 345942400 14 2010 1 22
2010-01-26 Tuesday 109.339996 110.470001 109.040001 109.309998 211168800 16 2010 1 26
2010-01-27 Wednesday 109.169998 110.080002 108.330002 109.830002 271863600 17 2010 1 27
2010-01-28 Thursday 110.190002 110.25 107.910004 108.57 316104000 18 2010 1 28
2010-01-29 Friday 109.040001 109.800003 107.220001 107.389999 310677600 19 2010 1 29