python pandas 每月简单的行编号

问题描述 投票:0回答:1

我未能找到交易研究的解决方案。

我有一个从 1993 年到 2024 年运行的大型市场数据集。

由于日历日与交易日不符。它可能按“工作日”实施,但谁知道假期/工作日是否与交易日匹配。如果满足某些条件,我需要获得每月的交易日。

                  Day        Open        High         Low       Close     Volume  trading_day
Date
2010-01-04     Monday  112.370003  113.389999  111.510002  113.330002  118944600            1
2010-01-05    Tuesday  113.260002  113.680000  112.849998  113.629997  111579900            2
2010-01-06  Wednesday  113.519997  113.989998  113.430000  113.709999  116074400            3
2010-01-07   Thursday  113.500000  114.330002  113.180000  114.190002  131091100            4
2010-01-08     Friday  113.889999  114.620003  113.660004  114.570000  126402800            5
2010-01-11     Monday  115.080002  115.129997  114.239998  114.730003  106375700            6
2010-01-12    Tuesday  113.970001  114.209999  113.220001  113.660004  163333500            7
2010-01-13  Wednesday  113.949997  114.940002  113.370003  114.620003  161822000            8
2010-01-14   Thursday  114.489998  115.139999  114.419998  114.930000  115718800            9
2010-01-15     Friday  114.730003  114.839996  113.199997  113.639999  212283100           10
2010-01-19    Tuesday  113.620003  115.129997  113.589996  115.059998  139172700           11
2010-01-20  Wednesday  114.279999  114.449997  112.980003  113.889999  216490200           12
2010-01-21   Thursday  113.919998  114.269997  111.559998  111.699997  344859600           13
2010-01-22     Friday  111.199997  111.739998  109.089996  109.209999  345942400           14
2010-01-25     Monday  110.209999  110.410004  109.410004  109.769997  186937500           15
2010-01-26    Tuesday  109.339996  110.470001  109.040001  109.309998  211168800           16
2010-01-27  Wednesday  109.169998  110.080002  108.330002  109.830002  271863600           17
2010-01-28   Thursday  110.190002  110.250000  107.910004  108.570000  316104000           18
2010-01-29     Friday  109.040001  109.800003  107.220001  107.389999  310677600           19

上面是我想要实现的一个例子。我通过使用以下代码选择一个月进行测试来实现这一点。

df_test['trading_day'] = range(1, len(df_test) + 1)

当然,自 1993 年以来已有大约 300 多个月,所以全部手动完成将是地狱。

我使用以下代码成功按月实现了

df.groupby

df_grouped_monthly = df.groupby(pd.Grouper(freq='M'))
并尝试应用与上面相同的计数器。遗憾的是,它不适用于分组 df。我试过
.transform(add(1))

可能的解决方案是获取分组行位置 - 不知道行号是否从初始 df 继承,或者基于 groupby 进行计数。

或者将上面提到的计数器添加到分组 df 中。我未能向分组 df 添加额外的列。

有关如何通过 df 在整个数据集上基于每月实现上述目标的任何建议。

python pandas counter
1个回答
0
投票

您可以将

'Date'
列转换为日期时间,并在分组之前为
'Year', 
'Month' 和
'Day'
创建新列。

df["Year"] = pd.to_datetime(df.Date, format='%b', errors='coerce').dt.year
df["Month"] = pd.to_datetime(df.Date, format='%b', errors='coerce').dt.month
df["Day"] = pd.to_datetime(df.Date, format='%b', errors='coerce').dt.day
df = df.sort_values(by=["Year", "Month", "Day"])

输出日期帧:

Date    Day             Open        High       Low        Close   Volume trading_dayYearMonth   Day
2010-01-14  Thursday    114.489998  115.139999  114.419998  114.93  115718800   9   2010    1   14
2010-01-11  Monday  115.080002  115.129997  114.239998  114.730003  106375700   6   2010    1   11
2010-01-08  Friday  113.889999  114.620003  113.660004  114.57  126402800   5   2010    1   8
2010-01-19  Tuesday 113.620003  115.129997  113.589996  115.059998  139172700   11  2010    1   19
2010-01-06  Wednesday   113.519997  113.989998  113.43  113.709999  116074400   3   2010    1   6
2010-01-13  Wednesday   113.949997  114.940002  113.370003  114.620003  161822000   8   2010    1   13
2010-01-12  Tuesday 113.970001  114.209999  113.220001  113.660004  163333500   7   2010    1   12
2010-01-15  Friday  114.730003  114.839996  113.199997  113.639999  212283100   10  2010    1   15
2010-01-07  Thursday    113.5   114.330002  113.18  114.190002  131091100   4   2010    1   7
2010-01-20  Wednesday   114.279999  114.449997  112.980003  113.889999  216490200   12  2010    1   20
2010-01-05  Tuesday 113.260002  113.68  112.849998  113.629997  111579900   2   2010    1   5
2010-01-21  Thursday    113.919998  114.269997  111.559998  111.699997  344859600   13  2010    1   21
2010-01-04  Monday  112.370003  113.389999  111.510002  113.330002  118944600   1   2010    1   4
2010-01-25  Monday  110.209999  110.410004  109.410004  109.769997  186937500   15  2010    1   25
2010-01-22  Friday  111.199997  111.739998  109.089996  109.209999  345942400   14  2010    1   22
2010-01-26  Tuesday 109.339996  110.470001  109.040001  109.309998  211168800   16  2010    1   26
2010-01-27  Wednesday   109.169998  110.080002  108.330002  109.830002  271863600   17  2010    1   27
2010-01-28  Thursday    110.190002  110.25  107.910004  108.57  316104000   18  2010    1   28
2010-01-29  Friday  109.040001  109.800003  107.220001  107.389999  310677600   19  2010    1   29
© www.soinside.com 2019 - 2024. All rights reserved.