循环遍历行和列以计算新DataFrame中的复合速率

问题描述 投票:0回答:1

我一直在努力在每一行和每列中运行一个循环。循环遍历每一行时,我想计算复合回报率。

有两种不同的DataFrame(df1和df2),其中df1显示股票代码,df2显示各自的价格。我正在尝试基于下面列出的'if语句'构建一个新的DataFrame(df3)。

  • 如果df1.row [1] = df1.row [0],则(df2.row [1] /df2.row [0])* df3 [0]
  • 如果df1.row [1] <> df1.row [0],那么df3 [1] = df3 [0]

第一个DataFrame = df1

          Date     1         2         3       4      5 
  0 2000-12-05  PXX.TO    MX.TO    CAE.TO   HRX.TO  FR.TO
  1 2000-12-06  PXX.TO    MX.TO    CAE.TO   HRX.TO  FR.TO   
  2 2000-12-07  FTS.TO    MX.TO    CAE.TO   HRX.TO  FR.TO   
  3 2000-12-08  FTS.TO    MX.TO    CAE.TO   HRX.TO  FR.TO
  4 2000-12-09  FTS.TO    G.TO     CAE.TO   HRX.TO  TB.TO
  5 2000-12-10  FTS.TO    G.TO     KYU.TO   HRX.TO  TB.TO
  6 2000-12-11  FTS.TO    G.TO     KYU.TO   HRX.TO  TB.TO   
  7 2000-12-12  BAM-A.TO  G.TO     KYU.TO   HRX.TO  TB.TO   
  8 2000-12-13  BAM-A.TO  PLI.TO   KYU.TO   HRX.TO  TB.TO
  9 2000-12-14  BAM-A.TO  PLI.TO   KYU.TO   HRX.TO  TB.TO  
 10 2000-12-15  BAM-A.TO  PLI.TO   KYU.TO   HRX.TO  TB.TO

第二个DataFrame = df2

          Date    1       2       3        4      5
  0 2000-12-05  2.3     60.10   2.30    34.98   35.00
  1 2000-12-06  2.35    60.70   2.38    35.43   35.01
  2 2000-12-07  56.76   61.31   2.46    35.89   35.02
  3 2000-12-08  57.33   61.92   2.54    36.35   35.04
  4 2000-12-09  57.90   100.20  2.63    36.83   300.90
  5 2000-12-10  58.48   101.00  69.56   37.30   304.18
  6 2000-12-11  59.07   101.81  70.46   37.78   307.50
  7 2000-12-12  4.50    102.62  71.37   38.27   310.85
  8 2000-12-13  4.54    44.50   72.29   38.77   314.24
  9 2000-12-14  4.57    45.39   73.23   39.27   317.66
 10 2000-12-15  4.61    46.30   74.18   39.78   321.12

期望输出= df3

           Date     1      2       3       4      5
   0 2000-12-05 1.0000  1.0000  1.0000  1.0000  1.0000
   1 2000-12-06 1.0200  1.0100  1.0340  1.0129  1.0003
   2 2000-12-07 1.0200  1.0201  1.0692  1.0260  1.0007
   3 2000-12-08 1.0302  1.0303  1.1055  1.0393  1.0010
   4 2000-12-09 1.0405  1.0303  1.1431  1.0528  1.0010
   5 2000-12-10 1.0509  1.0385  1.1431  1.0664  1.0119
   6 2000-12-11 1.0614  1.0469  1.1579  1.0802  1.0230
   7 2000-12-12 1.0614  1.0552  1.1729  1.0941  1.0341
   8 2000-12-13 1.0699  1.0552  1.1880  1.1083  1.0454
   9 2000-12-14 1.0785  1.0763  1.2034  1.1226  1.0568
  10 2000-12-15 1.0871  1.0979  1.2190  1.1371  1.0683

下面显示了第1列df3中值的公式

df3.row[0]  = 1
df3.row[1]  = (2.35/2.30) * 1 = 1.0200
df3.row[2]  = (56.76/56.76) * 1.0200 = 1.0200
df3.row[3]  = (57.33/56.76) * 1.0200 = 1.0302
df3.row[4]  = (57.90/57.33) * 1.0302 = 1.0405
df3.row[5]  = (58.48/57.90) * 1.0405 = 1.0509
df3.row[6]  = (59.07/58.48) * 1.0509 = 1.0614
df3.row[7]  = (4.50/4.50) * 1.0614 = 1.0614
df3.row[8]  = (4.54/4.50) * 1.0614 = 1.0699
df3.row[9]  = (4.57/4.54) * 1.0699 = 1.0785
df3.row[10] = (4.61/4.57) * 1.0785 = 1.0871

以下是我到目前为止的情况。不相信这是最好的方法。

StartFromDay = 1
NumOfHoldings = 10
df3 = pd.DataFrame(columns = np.arange(1,NumOfHoldings+1))
df3.index.names = ['Date']

for col in df1.columns:

    #First row should equal 1
    df3.iloc[0][col] == 1

    for i in range(StartFromDay, len(df1)):

       #first row of each column
       prevrow = df1.iloc[0][col]
       if df1.iloc[i][col] == prevrow:
           ###### If Statements to calculate compound return#######
python-3.x loops dataframe if-statement
1个回答
1
投票

循环很慢,所以我们将以矢量化方式进行。首先,适当地设置索引:

df1.set_index('Date', inplace=True)
df2.set_index('Date', inplace=True) 

接下来,生成一个布尔掩码,只要符号相同,该掩码为True:

same_stock = df1.iloc[1:].values == df1.iloc[:-1].values

我们必须使用values,因为移位的系列不再在索引上对齐。

并制作一个包含所有df2.row[1]/df2.row[0]值的矩阵:

ret = df2.iloc[1:].values / df2.iloc[:-1].values

接下来,替换符号更改的返回值:

ret[~same_stock] = 1 # pretend return is flat when symbol changed

现在创建一个带有结果的DataFrame:

simpret = pd.DataFrame(np.vstack(([1,1,1,1,1], ret)), df1.index)
df3 = simpret.cumprod()
© www.soinside.com 2019 - 2024. All rights reserved.