我一直在努力在每一行和每列中运行一个循环。循环遍历每一行时,我想计算复合回报率。
有两种不同的DataFrame(df1和df2),其中df1显示股票代码,df2显示各自的价格。我正在尝试基于下面列出的'if语句'构建一个新的DataFrame(df3)。
第一个DataFrame = df1
Date 1 2 3 4 5
0 2000-12-05 PXX.TO MX.TO CAE.TO HRX.TO FR.TO
1 2000-12-06 PXX.TO MX.TO CAE.TO HRX.TO FR.TO
2 2000-12-07 FTS.TO MX.TO CAE.TO HRX.TO FR.TO
3 2000-12-08 FTS.TO MX.TO CAE.TO HRX.TO FR.TO
4 2000-12-09 FTS.TO G.TO CAE.TO HRX.TO TB.TO
5 2000-12-10 FTS.TO G.TO KYU.TO HRX.TO TB.TO
6 2000-12-11 FTS.TO G.TO KYU.TO HRX.TO TB.TO
7 2000-12-12 BAM-A.TO G.TO KYU.TO HRX.TO TB.TO
8 2000-12-13 BAM-A.TO PLI.TO KYU.TO HRX.TO TB.TO
9 2000-12-14 BAM-A.TO PLI.TO KYU.TO HRX.TO TB.TO
10 2000-12-15 BAM-A.TO PLI.TO KYU.TO HRX.TO TB.TO
第二个DataFrame = df2
Date 1 2 3 4 5
0 2000-12-05 2.3 60.10 2.30 34.98 35.00
1 2000-12-06 2.35 60.70 2.38 35.43 35.01
2 2000-12-07 56.76 61.31 2.46 35.89 35.02
3 2000-12-08 57.33 61.92 2.54 36.35 35.04
4 2000-12-09 57.90 100.20 2.63 36.83 300.90
5 2000-12-10 58.48 101.00 69.56 37.30 304.18
6 2000-12-11 59.07 101.81 70.46 37.78 307.50
7 2000-12-12 4.50 102.62 71.37 38.27 310.85
8 2000-12-13 4.54 44.50 72.29 38.77 314.24
9 2000-12-14 4.57 45.39 73.23 39.27 317.66
10 2000-12-15 4.61 46.30 74.18 39.78 321.12
期望输出= df3
Date 1 2 3 4 5
0 2000-12-05 1.0000 1.0000 1.0000 1.0000 1.0000
1 2000-12-06 1.0200 1.0100 1.0340 1.0129 1.0003
2 2000-12-07 1.0200 1.0201 1.0692 1.0260 1.0007
3 2000-12-08 1.0302 1.0303 1.1055 1.0393 1.0010
4 2000-12-09 1.0405 1.0303 1.1431 1.0528 1.0010
5 2000-12-10 1.0509 1.0385 1.1431 1.0664 1.0119
6 2000-12-11 1.0614 1.0469 1.1579 1.0802 1.0230
7 2000-12-12 1.0614 1.0552 1.1729 1.0941 1.0341
8 2000-12-13 1.0699 1.0552 1.1880 1.1083 1.0454
9 2000-12-14 1.0785 1.0763 1.2034 1.1226 1.0568
10 2000-12-15 1.0871 1.0979 1.2190 1.1371 1.0683
下面显示了第1列df3中值的公式
df3.row[0] = 1
df3.row[1] = (2.35/2.30) * 1 = 1.0200
df3.row[2] = (56.76/56.76) * 1.0200 = 1.0200
df3.row[3] = (57.33/56.76) * 1.0200 = 1.0302
df3.row[4] = (57.90/57.33) * 1.0302 = 1.0405
df3.row[5] = (58.48/57.90) * 1.0405 = 1.0509
df3.row[6] = (59.07/58.48) * 1.0509 = 1.0614
df3.row[7] = (4.50/4.50) * 1.0614 = 1.0614
df3.row[8] = (4.54/4.50) * 1.0614 = 1.0699
df3.row[9] = (4.57/4.54) * 1.0699 = 1.0785
df3.row[10] = (4.61/4.57) * 1.0785 = 1.0871
以下是我到目前为止的情况。不相信这是最好的方法。
StartFromDay = 1
NumOfHoldings = 10
df3 = pd.DataFrame(columns = np.arange(1,NumOfHoldings+1))
df3.index.names = ['Date']
for col in df1.columns:
#First row should equal 1
df3.iloc[0][col] == 1
for i in range(StartFromDay, len(df1)):
#first row of each column
prevrow = df1.iloc[0][col]
if df1.iloc[i][col] == prevrow:
###### If Statements to calculate compound return#######
循环很慢,所以我们将以矢量化方式进行。首先,适当地设置索引:
df1.set_index('Date', inplace=True)
df2.set_index('Date', inplace=True)
接下来,生成一个布尔掩码,只要符号相同,该掩码为True:
same_stock = df1.iloc[1:].values == df1.iloc[:-1].values
我们必须使用values
,因为移位的系列不再在索引上对齐。
并制作一个包含所有df2.row[1]/df2.row[0]
值的矩阵:
ret = df2.iloc[1:].values / df2.iloc[:-1].values
接下来,替换符号更改的返回值:
ret[~same_stock] = 1 # pretend return is flat when symbol changed
现在创建一个带有结果的DataFrame:
simpret = pd.DataFrame(np.vstack(([1,1,1,1,1], ret)), df1.index)
df3 = simpret.cumprod()