为什么 GARCH-SVM 对条件波动率输出相同的预测？

Question

我正在使用 SVR-GARCH 模型来预测条件波动性，如 Abdullah Karasan 所著的《Machine Learning for Financial Risk Management with Python: Algorithms for Modeling Risk》一书中所述。

我遇到了一个问题，我的代码有时会在整个预测范围内为条件波动率生成相同的重复值。我知道初始参数值是随机的，但我很困惑为什么在大多数情况下，预测结果在整个预测期内都是恒定值。

import yfinance as yf
from datetime import datetime, timedelta
import pandas as pd
import numpy as np
from sklearn.svm import SVR
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform as sp_rand
from sklearn.preprocessing import StandardScaler

# Select assets
stock_name = ['AAPL']
end_date = datetime.today()
start_date = end_date - timedelta(days = 365 * 25)

# Download the prices
prices = yf.download(
    stock_name,
    start = start_date,
    end = end_date,
    interval = '1d',
)['Adj Close']
prices = prices.dropna()
stock_name = ['Apple']
prices = prices.rename(stock_name[0], inplace = True)

# Log returns
returns = np.log(np.array(prices)[1:] / np.array(prices)[:-1])

# Forecasting horizon
H = 146

returns_series = pd.Series(returns)
realized_vol = returns_series.rolling(5).std()
realized_vol = pd.DataFrame(realized_vol)
realized_vol.reset_index(drop=True, inplace=True)

returns_svm = pd.DataFrame(returns ** 2)

X = pd.concat([realized_vol, returns_svm], axis=1, ignore_index=True)
X = X[4:].copy()
X = X.reset_index()
X.drop('index', axis=1, inplace=True)

realized_vol = realized_vol.dropna().reset_index()
realized_vol.drop('index', axis=1, inplace=True)

conditional_volatility = pd.DataFrame(index=prices.index[-H:], columns=['SVM Linear','SVM RBF','SVM Poly'])

para_grid = {'gamma': sp_rand(0.1, 1), 'C': sp_rand(0.1, 10), 'epsilon': sp_rand(0.1, 1)}

svr_lin = SVR(kernel='linear')
clf = RandomizedSearchCV(svr_lin, para_grid)
clf.fit(X[:-H], realized_vol.iloc[1:-(H-1)].values.reshape(-1,))
predict_svr_lin = clf.predict(X[-H:])
conditional_volatility['SVM Linear'] = predict_svr_lin

svr_rbf = SVR(kernel='rbf')
clf = RandomizedSearchCV(svr_rbf, para_grid)
clf.fit(X[:-H], realized_vol.iloc[1:-(H-1)].values.reshape(-1,))
predict_svr_rbf = clf.predict(X[-H:])
conditional_volatility['SVM RBF'] = predict_svr_rbf

svr_poly = SVR(kernel='poly')
clf = RandomizedSearchCV(svr_poly, para_grid)
clf.fit(X[:-H], realized_vol.iloc[1:-(H-1)].values.reshape(-1,))
predict_svr_poly = clf.predict(X[-H:])
conditional_volatility['SVM Poly'] = predict_svr_poly

print(conditional_volatility)

[*********************100%%**********************]  1 of 1 completed
            SVM Linear   SVM RBF  SVM Poly
Date                                      
2024-01-09    0.168156  0.168156  0.138204
2024-01-10    0.168156  0.168156  0.138204
2024-01-11    0.168156  0.168156  0.138204
2024-01-12    0.168156  0.168156  0.138204
2024-01-16    0.168156  0.168156  0.138204
...                ...       ...       ...
2024-08-01    0.168156  0.168156  0.138204
2024-08-02    0.168156  0.168156  0.138204
2024-08-05    0.168156  0.168156  0.138204
2024-08-06    0.168156  0.168156  0.138204
2024-08-07    0.168156  0.168156  0.138204

[146 rows x 3 columns]

任何人都可以帮助我理解为什么会发生这种情况以及如何解决它吗？

Answer 1

问题出在你的

epsilon

价值观上。根据SVR手册：

epsilon - epsilon-SVR 模型中的 Epsilon。它指定了 epsilon-tube，其中训练中不涉及惩罚损失函数，预测点距离 epsilon 以内实际价值。必须为非负数。

因此，对于高 epsilon（epsilon 远大于波动性），您不会对错误的预测给予任何惩罚。

让我们检查一下您的数据是如何分布的：

print(np.quantile(realized_vol, q = [0,0.01,0.05,0.5,0.95, 0.99,1]))
# [0.00075101 0.00375092 0.0058641  0.04780242 0.06916338 0.33556116]

基本上在数据集的开始阶段波动性非常低，并且在后期波动性要大得多。这里问你的一个问题是为什么你不以百分比来计算波动性？例如这样的：

realized_vol_perc = 100 * returns_series.rolling(5).std() /  returns_series

无论如何，如果你允许 epsilon 接近

（如你的代码中所示），你就不会惩罚任何波动点，因为它们都远低于 1。

修复方法是将参数网格更改为这样：

para_grid = {'gamma': sp_rand(0.1, 1), 'C': sp_rand(0.1, 10), 'epsilon': sp_rand(0.001, 0.01)}

或者将波动率乘以一个高常数：

para_grid = {'gamma': sp_rand(0.1, 1), 'C': sp_rand(0.1, 10), 'epsilon': sp_rand(0.1, 1)}

realized_vol = realized_vol * 100

为什么 GARCH-SVM 对条件波动率输出相同的预测？

问题描述投票：0回答：1

1个回答

最新问题

为什么 GARCH-SVM 对条件波动率输出相同的预测？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1