我正在使用 SVR-GARCH 模型来预测条件波动性,如 Abdullah Karasan 所著的《Machine Learning for Financial Risk Management with Python: Algorithms for Modeling Risk》一书中所述。
我遇到了一个问题,我的代码有时会在整个预测范围内为条件波动率生成相同的重复值。我知道初始参数值是随机的,但我很困惑为什么在大多数情况下,预测结果在整个预测期内都是恒定值。
import yfinance as yf
from datetime import datetime, timedelta
import pandas as pd
import numpy as np
from sklearn.svm import SVR
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform as sp_rand
from sklearn.preprocessing import StandardScaler
# Select assets
stock_name = ['AAPL']
end_date = datetime.today()
start_date = end_date - timedelta(days = 365 * 25)
# Download the prices
prices = yf.download(
stock_name,
start = start_date,
end = end_date,
interval = '1d',
)['Adj Close']
prices = prices.dropna()
stock_name = ['Apple']
prices = prices.rename(stock_name[0], inplace = True)
# Log returns
returns = np.log(np.array(prices)[1:] / np.array(prices)[:-1])
# Forecasting horizon
H = 146
returns_series = pd.Series(returns)
realized_vol = returns_series.rolling(5).std()
realized_vol = pd.DataFrame(realized_vol)
realized_vol.reset_index(drop=True, inplace=True)
returns_svm = pd.DataFrame(returns ** 2)
X = pd.concat([realized_vol, returns_svm], axis=1, ignore_index=True)
X = X[4:].copy()
X = X.reset_index()
X.drop('index', axis=1, inplace=True)
realized_vol = realized_vol.dropna().reset_index()
realized_vol.drop('index', axis=1, inplace=True)
conditional_volatility = pd.DataFrame(index=prices.index[-H:], columns=['SVM Linear','SVM RBF','SVM Poly'])
para_grid = {'gamma': sp_rand(0.1, 1), 'C': sp_rand(0.1, 10), 'epsilon': sp_rand(0.1, 1)}
svr_lin = SVR(kernel='linear')
clf = RandomizedSearchCV(svr_lin, para_grid)
clf.fit(X[:-H], realized_vol.iloc[1:-(H-1)].values.reshape(-1,))
predict_svr_lin = clf.predict(X[-H:])
conditional_volatility['SVM Linear'] = predict_svr_lin
svr_rbf = SVR(kernel='rbf')
clf = RandomizedSearchCV(svr_rbf, para_grid)
clf.fit(X[:-H], realized_vol.iloc[1:-(H-1)].values.reshape(-1,))
predict_svr_rbf = clf.predict(X[-H:])
conditional_volatility['SVM RBF'] = predict_svr_rbf
svr_poly = SVR(kernel='poly')
clf = RandomizedSearchCV(svr_poly, para_grid)
clf.fit(X[:-H], realized_vol.iloc[1:-(H-1)].values.reshape(-1,))
predict_svr_poly = clf.predict(X[-H:])
conditional_volatility['SVM Poly'] = predict_svr_poly
print(conditional_volatility)
[*********************100%%**********************] 1 of 1 completed
SVM Linear SVM RBF SVM Poly
Date
2024-01-09 0.168156 0.168156 0.138204
2024-01-10 0.168156 0.168156 0.138204
2024-01-11 0.168156 0.168156 0.138204
2024-01-12 0.168156 0.168156 0.138204
2024-01-16 0.168156 0.168156 0.138204
... ... ... ...
2024-08-01 0.168156 0.168156 0.138204
2024-08-02 0.168156 0.168156 0.138204
2024-08-05 0.168156 0.168156 0.138204
2024-08-06 0.168156 0.168156 0.138204
2024-08-07 0.168156 0.168156 0.138204
[146 rows x 3 columns]
任何人都可以帮助我理解为什么会发生这种情况以及如何解决它吗?
问题出在你的
epsilon
价值观上。根据SVR手册:
epsilon - epsilon-SVR 模型中的 Epsilon。它指定了 epsilon-tube,其中训练中不涉及惩罚 损失函数,预测点距离 epsilon 以内 实际价值。必须为非负数。
因此,对于高 epsilon(epsilon 远大于波动性),您不会对错误的预测给予任何惩罚。
让我们检查一下您的数据是如何分布的:
print(np.quantile(realized_vol, q = [0,0.01,0.05,0.5,0.95, 0.99,1]))
# [0.00075101 0.00375092 0.0058641 0.04780242 0.06916338 0.33556116]
基本上在数据集的开始阶段波动性非常低,并且在后期波动性要大得多。这里问你的一个问题是为什么你不以百分比来计算波动性?例如这样的:
realized_vol_perc = 100 * returns_series.rolling(5).std() / returns_series
无论如何,如果你允许 epsilon 接近
1
(如你的代码中所示),你就不会惩罚任何波动点,因为它们都远低于 1。
修复方法是将参数网格更改为这样:
para_grid = {'gamma': sp_rand(0.1, 1), 'C': sp_rand(0.1, 10), 'epsilon': sp_rand(0.001, 0.01)}
或者将波动率乘以一个高常数:
para_grid = {'gamma': sp_rand(0.1, 1), 'C': sp_rand(0.1, 10), 'epsilon': sp_rand(0.1, 1)}
realized_vol = realized_vol * 100