我的函数是这样的,它在数据帧范围内取最小值,该范围的长度不断增加,在数据帧范围内取最大值,每次迭代长度都会减少。
要执行此计算的数据帧是另一个更大数据帧本身的子集,这会导致嵌套循环,从而显着增加时间复杂度。
def drawdown(result_df, dict_dfs):
last_year_df = pd.DataFrame(data=np.nan, index=result_df.index, columns=result_df.columns)
for idx in range(len(result_df)):
for stock in result_df.columns:
past_date_idx = max(0, idx - 250)
past_date = result_df.index[past_date_idx]
current_date = result_df.index[idx]
last_year = dict_dfs['close'].loc[past_date:current_date, stock]
drawdowns = []
for i in range(len(last_year)):
rolling_min = last_year.iloc[:i + 1].min()
rolling_max = last_year.iloc[i:].max()
if rolling_min != 0:
drawdown = (rolling_max - rolling_min) / rolling_min
drawdowns.append(drawdown)
last_year_df.iloc[idx][stock] = np.median(drawdowns)
return last_year_df
有了这段代码,有什么功能可以帮助我提高速度吗?如果是,那么我应该进行哪些更改以使代码逻辑相同,但我不使用循环,而是使用矢量化函数!
@@Micheal
发表评论后def drawdown(result_df, dict_dfs):
last_year_df = pd.DataFrame(data=np.nan, index=result_df.index, columns=result_df.columns)
for idx in range(len(result_df)):
for stock in result_df.columns:
past_date_idx = max(0, idx - 250)
past_date = result_df.index[past_date_idx]
current_date = result_df.index[idx]
last_year = dict_dfs['close'].loc[past_date:current_date, stock]
max_dataframe = last_year.iloc[::-1]
min_dataframe = last_year
rolling_max = max_dataframe.cummax()
rolling_min = min_dataframe.cummin()
drawdown = (rolling_max - rolling_min) / rolling_min
last_year_df.loc[current_date] = drawdown.iloc[-1]
return last_year_df
乍一看逻辑似乎是正确的,但结果却并非如此。我在这两种情况下的输出是不同的。有人可以建议这里出了什么问题吗?
以下是 result_df 的示例
ABB IN Equity ABNL IN Equity
01-02-2005 FALSE FALSE
02-02-2005 FALSE FALSE
03-02-2005 FALSE FALSE
04-02-2005 FALSE FALSE
07-02-2005 FALSE FALSE
08-02-2005 FALSE FALSE
09-02-2005 FALSE FALSE
10-02-2005 FALSE FALSE
11-02-2005 FALSE FALSE
14-02-2005 FALSE FALSE
15-02-2005 FALSE FALSE
16-02-2005 FALSE FALSE
17-02-2005 FALSE FALSE
18-02-2005 FALSE FALSE
21-02-2005 FALSE FALSE
22-02-2005 FALSE FALSE
23-02-2005 FALSE FALSE
24-02-2005 FALSE FALSE
25-02-2005 FALSE FALSE
28-02-2005 FALSE FALSE
01-03-2005 FALSE FALSE
02-03-2005 FALSE FALSE
03-03-2005 FALSE FALSE
04-03-2005 FALSE FALSE
07-03-2005 FALSE FALSE
08-03-2005 FALSE FALSE
09-03-2005 FALSE FALSE
10-03-2005 FALSE FALSE
11-03-2005 FALSE FALSE
14-03-2005 FALSE FALSE
15-03-2005 FALSE FALSE
16-03-2005 FALSE FALSE
17-03-2005 FALSE FALSE
18-03-2005 FALSE FALSE
21-03-2005 FALSE FALSE
22-03-2005 FALSE FALSE
23-03-2005 FALSE FALSE
24-03-2005 FALSE FALSE
28-03-2005 FALSE FALSE
29-03-2005 FALSE FALSE
30-03-2005 FALSE FALSE
31-03-2005 FALSE FALSE
01-04-2005 FALSE FALSE
04-04-2005 FALSE FALSE
05-04-2005 FALSE FALSE
06-04-2005 FALSE FALSE
07-04-2005 FALSE FALSE
08-04-2005 FALSE FALSE
11-04-2005 FALSE FALSE
12-04-2005 FALSE FALSE
13-04-2005 FALSE FALSE
15-04-2005 FALSE FALSE
18-04-2005 FALSE FALSE
19-04-2005 FALSE FALSE
20-04-2005 FALSE FALSE
21-04-2005 FALSE FALSE
22-04-2005 FALSE FALSE
25-04-2005 FALSE FALSE
26-04-2005 FALSE FALSE
27-04-2005 FALSE FALSE
28-04-2005 FALSE FALSE
29-04-2005 FALSE FALSE
02-05-2005 FALSE FALSE
03-05-2005 FALSE FALSE
04-05-2005 FALSE FALSE
05-05-2005 FALSE FALSE
06-05-2005 FALSE FALSE
09-05-2005 FALSE FALSE
10-05-2005 FALSE FALSE
11-05-2005 FALSE FALSE
12-05-2005 FALSE FALSE
13-05-2005 FALSE FALSE
16-05-2005 FALSE FALSE
17-05-2005 FALSE FALSE
18-05-2005 FALSE FALSE
19-05-2005 FALSE FALSE
20-05-2005 FALSE FALSE
23-05-2005 FALSE FALSE
24-05-2005 FALSE FALSE
25-05-2005 FALSE FALSE
26-05-2005 FALSE FALSE
27-05-2005 FALSE FALSE
30-05-2005 FALSE FALSE
31-05-2005 FALSE FALSE
01-06-2005 FALSE FALSE
02-06-2005 FALSE FALSE
03-06-2005 FALSE FALSE
04-06-2005 FALSE FALSE
06-06-2005 FALSE FALSE
07-06-2005 FALSE FALSE
08-06-2005 FALSE FALSE
09-06-2005 FALSE FALSE
10-06-2005 FALSE FALSE
13-06-2005 FALSE FALSE
14-06-2005 FALSE FALSE
15-06-2005 FALSE FALSE
16-06-2005 FALSE FALSE
17-06-2005 FALSE FALSE
20-06-2005 FALSE FALSE
21-06-2005 FALSE FALSE
22-06-2005 FALSE FALSE
23-06-2005 FALSE FALSE
24-06-2005 FALSE FALSE
27-06-2005 FALSE FALSE
28-06-2005 FALSE FALSE
29-06-2005 FALSE FALSE
30-06-2005 FALSE FALSE
01-07-2005 FALSE FALSE
04-07-2005 FALSE FALSE
05-07-2005 FALSE FALSE
06-07-2005 FALSE FALSE
07-07-2005 FALSE FALSE
08-07-2005 FALSE FALSE
11-07-2005 TRUE FALSE
12-07-2005 FALSE FALSE
13-07-2005 FALSE FALSE
14-07-2005 FALSE FALSE
15-07-2005 FALSE FALSE
18-07-2005 FALSE FALSE
19-07-2005 FALSE FALSE
20-07-2005 FALSE FALSE
21-07-2005 FALSE FALSE
22-07-2005 FALSE FALSE
25-07-2005 FALSE FALSE
26-07-2005 FALSE FALSE
27-07-2005 FALSE FALSE
29-07-2005 FALSE FALSE
01-08-2005 FALSE FALSE
02-08-2005 FALSE FALSE
03-08-2005 FALSE FALSE
04-08-2005 FALSE FALSE
05-08-2005 FALSE FALSE
08-08-2005 FALSE FALSE
09-08-2005 FALSE FALSE
10-08-2005 FALSE FALSE
11-08-2005 FALSE FALSE
12-08-2005 FALSE FALSE
16-08-2005 FALSE FALSE
17-08-2005 FALSE FALSE
18-08-2005 FALSE FALSE
19-08-2005 FALSE FALSE
22-08-2005 FALSE FALSE
23-08-2005 FALSE FALSE
24-08-2005 FALSE FALSE
25-08-2005 FALSE FALSE
26-08-2005 FALSE FALSE
29-08-2005 FALSE FALSE
30-08-2005 FALSE FALSE
31-08-2005 FALSE FALSE
01-09-2005 FALSE FALSE
02-09-2005 FALSE FALSE
05-09-2005 FALSE FALSE
06-09-2005 FALSE FALSE
08-09-2005 FALSE FALSE
09-09-2005 FALSE FALSE
12-09-2005 FALSE FALSE
13-09-2005 FALSE FALSE
14-09-2005 FALSE FALSE
15-09-2005 FALSE FALSE
16-09-2005 FALSE FALSE
19-09-2005 FALSE FALSE
20-09-2005 FALSE FALSE
21-09-2005 FALSE FALSE
22-09-2005 FALSE FALSE
23-09-2005 FALSE FALSE
26-09-2005 FALSE FALSE
27-09-2005 FALSE FALSE
28-09-2005 FALSE FALSE
29-09-2005 FALSE FALSE
30-09-2005 FALSE FALSE
03-10-2005 FALSE FALSE
04-10-2005 FALSE FALSE
05-10-2005 FALSE FALSE
06-10-2005 FALSE FALSE
07-10-2005 FALSE FALSE
10-10-2005 FALSE FALSE
11-10-2005 FALSE FALSE
13-10-2005 FALSE FALSE
14-10-2005 FALSE FALSE
17-10-2005 FALSE FALSE
18-10-2005 FALSE FALSE
19-10-2005 FALSE FALSE
20-10-2005 FALSE FALSE
21-10-2005 FALSE FALSE
24-10-2005 FALSE FALSE
25-10-2005 FALSE FALSE
26-10-2005 FALSE FALSE
27-10-2005 FALSE FALSE
28-10-2005 FALSE FALSE
31-10-2005 FALSE FALSE
01-11-2005 FALSE FALSE
02-11-2005 FALSE FALSE
07-11-2005 FALSE FALSE
08-11-2005 FALSE FALSE
09-11-2005 FALSE FALSE
10-11-2005 FALSE FALSE
11-11-2005 FALSE FALSE
14-11-2005 FALSE FALSE
16-11-2005 FALSE FALSE
17-11-2005 FALSE FALSE
18-11-2005 FALSE FALSE
21-11-2005 FALSE FALSE
22-11-2005 FALSE FALSE
23-11-2005 FALSE FALSE
24-11-2005 FALSE FALSE
25-11-2005 FALSE FALSE
26-11-2005 FALSE FALSE
28-11-2005 FALSE FALSE
29-11-2005 FALSE FALSE
30-11-2005 FALSE FALSE
01-12-2005 FALSE FALSE
02-12-2005 FALSE FALSE
05-12-2005 FALSE FALSE
06-12-2005 FALSE FALSE
07-12-2005 FALSE FALSE
08-12-2005 FALSE FALSE
09-12-2005 FALSE FALSE
12-12-2005 FALSE FALSE
13-12-2005 FALSE FALSE
14-12-2005 FALSE FALSE
15-12-2005 FALSE FALSE
16-12-2005 FALSE FALSE
19-12-2005 FALSE FALSE
20-12-2005 FALSE FALSE
21-12-2005 FALSE FALSE
22-12-2005 FALSE FALSE
23-12-2005 FALSE FALSE
26-12-2005 FALSE FALSE
27-12-2005 FALSE FALSE
28-12-2005 FALSE FALSE
29-12-2005 FALSE FALSE
30-12-2005 FALSE FALSE
02-01-2006 FALSE FALSE
03-01-2006 FALSE FALSE
04-01-2006 FALSE FALSE
05-01-2006 FALSE FALSE
06-01-2006 FALSE FALSE
09-01-2006 FALSE FALSE
10-01-2006 FALSE FALSE
12-01-2006 FALSE FALSE
13-01-2006 FALSE FALSE
16-01-2006 FALSE FALSE
17-01-2006 FALSE FALSE
18-01-2006 FALSE FALSE
19-01-2006 FALSE FALSE
20-01-2006 TRUE FALSE
23-01-2006 FALSE FALSE
24-01-2006 FALSE FALSE
25-01-2006 TRUE FALSE
27-01-2006 TRUE FALSE
30-01-2006 FALSE FALSE
31-01-2006 FALSE FALSE
01-02-2006 FALSE FALSE
02-02-2006 FALSE FALSE
03-02-2006 FALSE FALSE
06-02-2006 FALSE FALSE
07-02-2006 FALSE FALSE
08-02-2006 FALSE FALSE
10-02-2006 FALSE FALSE
13-02-2006 FALSE FALSE
14-02-2006 FALSE FALSE
15-02-2006 FALSE FALSE
16-02-2006 FALSE FALSE
17-02-2006 FALSE FALSE
20-02-2006 FALSE FALSE
21-02-2006 FALSE FALSE
22-02-2006 FALSE FALSE
23-02-2006 FALSE FALSE
24-02-2006 FALSE FALSE
27-02-2006 FALSE FALSE
28-02-2006 FALSE FALSE
01-03-2006 FALSE FALSE
02-03-2006 FALSE FALSE
03-03-2006 FALSE FALSE
以下是 dict_dfs['close'] 的示例:
ABB IN Equity ABNL IN Equity
01-02-2005 177.02
02-02-2005 180.08
03-02-2005 184.78
04-02-2005 191.27
07-02-2005 195.48
08-02-2005 207.81
09-02-2005 223.48
10-02-2005 222.94
11-02-2005 222.65
14-02-2005 228.7
15-02-2005 225.64
16-02-2005 225.1
17-02-2005 222.4
18-02-2005 223.48
21-02-2005 223.66
22-02-2005 219
23-02-2005 220.96
24-02-2005 221.5
25-02-2005 219.7
28-02-2005 221.5
01-03-2005 227.8
02-03-2005 228.26
03-03-2005 229.79
04-03-2005 231.58
07-03-2005 227.1
08-03-2005 229.44
09-03-2005 227.82
10-03-2005 221.5
11-03-2005 217.92
14-03-2005 218.8
15-03-2005 216.46
16-03-2005 217.28
17-03-2005 216.46
18-03-2005 215.2
21-03-2005 214.34
22-03-2005 213.4
23-03-2005 208.93
24-03-2005 192.69
28-03-2005 201.7
29-03-2005 195.39
30-03-2005 195.59
31-03-2005 202.35
01-04-2005 207.09
04-04-2005 212.14
05-04-2005 207.81
06-04-2005 208.89
07-04-2005 209.79
08-04-2005 216.1
11-04-2005 211.78
12-04-2005 221.5
13-04-2005 222.58
15-04-2005 224.2
18-04-2005 226.9
19-04-2005 212.72
20-04-2005 209.81
21-04-2005 216.1
22-04-2005 216.64
25-04-2005 218.26
26-04-2005 218.82
27-04-2005 220.35
28-04-2005 218.98
29-04-2005 216.11
02-05-2005 210.7
03-05-2005 215.56
04-05-2005 210.71
05-05-2005 221.52
06-05-2005 220.7
09-05-2005 222.4
10-05-2005 221.5
11-05-2005 219.7
12-05-2005 223.3
13-05-2005 209.81
16-05-2005 225.1
17-05-2005 229.26
18-05-2005 226.01
19-05-2005 229.62
20-05-2005 230.86
23-05-2005 228.96
24-05-2005 231.4
25-05-2005 231.94
26-05-2005 229.78
27-05-2005 228.7
30-05-2005 230.86
31-05-2005 230.86
01-06-2005 231.05
02-06-2005 230.5
03-06-2005 228.97
04-06-2005 230.5
06-06-2005 231.4
07-06-2005 231.08
08-06-2005 234.11
09-06-2005 233.75
10-06-2005 233.25
13-06-2005 233.21
14-06-2005 234.49
15-06-2005 235.02
16-06-2005 232.12
17-06-2005 231.13
20-06-2005 226.9
21-06-2005 228.89
22-06-2005 232.33
23-06-2005 229.6
24-06-2005 231.42
27-06-2005 231.42
28-06-2005 232.68
29-06-2005 233.57
30-06-2005 233.21
01-07-2005 235.28
04-07-2005 236.83
05-07-2005 237.2
06-07-2005 255.72
07-07-2005 249.51
08-07-2005 264.72
11-07-2005 263.1
12-07-2005 265.8
13-07-2005 263.28
14-07-2005 261.35
15-07-2005 261.3
18-07-2005 262.92
19-07-2005 270.12
20-07-2005 268.59
21-07-2005 266.52
22-07-2005 263.1
25-07-2005 259.52
26-07-2005 267.96
27-07-2005 266.97
29-07-2005 256.08
01-08-2005 262.92
02-08-2005 263.14
03-08-2005 266.88
04-08-2005 273.22
05-08-2005 279.13
08-08-2005 274.98
09-08-2005 268.32
10-08-2005 276.44
11-08-2005 275.54
12-08-2005 272.16
16-08-2005 273.9
17-08-2005 276.08
18-08-2005 279.32
19-08-2005 276.42
22-08-2005 274.62
23-08-2005 265.8
24-08-2005 263.82
25-08-2005 265.62
26-08-2005 270.12
29-08-2005 273.72
30-08-2005 281.11
31-08-2005 284.53
01-09-2005 287.23
02-09-2005 289.03
05-09-2005 298.95
06-09-2005 307.94
08-09-2005 315.16
09-09-2005 312.98
12-09-2005 322.75
13-09-2005 326.86
14-09-2005 319.64
15-09-2005 316.94
16-09-2005 317.49
19-09-2005 313.34
20-09-2005 308.32
21-09-2005 306.44
22-09-2005 307.13
23-09-2005 304.72
26-09-2005 306.32
27-09-2005 307.95
28-09-2005 307.98
29-09-2005 305.24
30-09-2005 306.32
03-10-2005 307.04
04-10-2005 308.84
05-10-2005 313.88
06-10-2005 313.34
07-10-2005 306.14
10-10-2005 309.74
11-10-2005 290.83
13-10-2005 300.05
14-10-2005 292.63
17-10-2005 290.11
18-10-2005 289.75
19-10-2005 289.64
20-10-2005 283.09
21-10-2005 289.57
24-10-2005 297.04
25-10-2005 296.23
26-10-2005 302.76
27-10-2005 291.73
28-10-2005 289.95
31-10-2005 295.35
01-11-2005 297.37
02-11-2005 295.35
07-11-2005 295.33
08-11-2005 307.94
09-11-2005 316.62
10-11-2005 301.64
11-11-2005 331.35
14-11-2005 328.65
16-11-2005 329.59
17-11-2005 328.7
18-11-2005 335.31
21-11-2005 331.39
22-11-2005 327.78
23-11-2005 331.35
24-11-2005 331.89
25-11-2005 334.05
26-11-2005 345.04
28-11-2005 333.14
29-11-2005 352.96
30-11-2005 346.94
01-12-2005 345.04
02-12-2005 351.16
05-12-2005 346.66
06-12-2005 338.55
07-12-2005 336.39
08-12-2005 337.29
09-12-2005 338.55
12-12-2005 345.77
13-12-2005 343.06
14-12-2005 341.43
15-12-2005 342.15
16-12-2005 338.55
19-12-2005 343.24
20-12-2005 352.89
21-12-2005 351.34
22-12-2005 353.86
23-12-2005 352.06
26-12-2005 345.23
27-12-2005 346.84
28-12-2005 344.86
29-12-2005 344.14
30-12-2005 345.76
02-01-2006 344.14
03-01-2006 343.96
04-01-2006 352.08
05-01-2006 345.77
06-01-2006 357.46
09-01-2006 362.86
10-01-2006 360.18
12-01-2006 356.56
13-01-2006 357.46
16-01-2006 355.66
17-01-2006 355.86
18-01-2006 359.26
19-01-2006 371.87
20-01-2006 374.75
23-01-2006 397.98
24-01-2006 415.99
25-01-2006 447.5
27-01-2006 454
30-01-2006 441.58
31-01-2006 447.86
01-02-2006 438.07
02-02-2006 439.94
03-02-2006 432.2
06-02-2006 427.15
07-02-2006 426.79
08-02-2006 436.97
10-02-2006 430.39
13-02-2006 437.06
14-02-2006 430.03
15-02-2006 443
16-02-2006 441.42
17-02-2006 432.41
20-02-2006 424.09
21-02-2006 430.39
22-02-2006 432.02
23-02-2006 428.95
24-02-2006 432.2
27-02-2006 443
28-02-2006 443.9
01-03-2006 452.94
02-03-2006 458.85
03-03-2006 474.87
我希望这些信息足以并有助于获得可重复的结果。
这是新优化的功能。它使用列表理解和向量化第一个索引计算。我在你的 4 行数据上进行了测试。希望它能很好地处理您的所有数据。但是,它可以作为您进行更多优化的起点。
def drawdown_vec(result_df, dict_dfs):
last_year_df = pd.DataFrame(data=np.nan, index=result_df.index, columns=result_df.columns)
past_dates_idx = np.maximum(0, list(result_df.index - 250))
past_dates = result_df.index[past_dates_idx]
current_dates = result_df.index[result_df.index]
last_years = [
dict_dfs['close'].loc[past_date:current_date, :] for past_date, current_date in zip(past_dates, current_dates)
]
max_dataframes = [
last_year.iloc[::-1] for last_year in last_years
]
min_dataframes = last_years
rolling_max = [
max_dataframe.cummax() for max_dataframe in max_dataframes
]
rolling_min = [
max_dataframe.cummin() for max_dataframe in max_dataframes
]
drawdowns = [
(rolling_max - rolling_min) / rolling_min for rolling_max, rolling_min in zip(max_dataframes, min_dataframes)
]
last_year_df.loc[current_dates] = drawdowns
return last_year_df
也许你可以将最后 4 个列表推导式放入 for 循环中。您可以使用以下方法比较它们在 ipython 中的性能:
%timeit drawdown_vec(result_df, dict_dfs)