我一直在使用 R 中的 hmmTMB 包来使用隐马尔可夫模型 (HMM) 来建模和预测油价走势。在用训练数据拟合模型后,我正在努力使用测试数据集上的拟合模型对观察到的状态(例如石油价格)进行一步预测。
# Load necessary libraries
library(hmmTMB)
library(tidyverse)
# Set seed for reproducibility
set.seed(123)
# Example dataset preparation (extended for 1 year)
df <- data.frame(
day = seq.Date(from = as.Date("2020-01-01"), to = as.Date("2020-12-31"), by = "day"),
OILPRICE = rnorm(366, 100, 10)
) %>%
as_tibble()
# Splitting into training and test sets
split_index <- round(nrow(df) * 0.8)
training_set <- df[1:split_index, ] %>% as.data.frame()
testing_set <- df[(split_index + 1):nrow(df), ] %>% as.data.frame()
# Define a hidden Markov model with 2 states for the training set
hid1 <- MarkovChain$new(data = training_set, n_states = 2)
dists <- list(OILPRICE = "norm")
par0 <- list(OILPRICE = list(mean = c(110, 90), sd = c(1, 1)))
obs1 <- Observation$new(data = training_set, n_states = 2, dists = dists, par = par0)
par0 <- obs1$suggest_initial()
obs1 <- Observation$new(data = training_set, n_states = 2, dists = dists, par = par0)
hmm1 <- HMM$new(obs = obs1, hid = hid1)
hmm1$fit(silent = TRUE)
# Rename
data_plot <- training_set
# Color by most likely state sequence
data_plot$state <- factor(paste0("State ", hmm1$viterbi()))
ggplot(data_plot, aes(day, OILPRICE, col = state)) +
geom_point() +
scale_color_manual(values = pal, name = NULL)
# Attempting to make a one-step-ahead prediction
# ??? This is where I need guidance
如何使用 hmmTMB 在我的测试数据集 (testing_set) 上使用拟合模型 (hmm1) 对观察到的状态(例如第二天的 OILPRICE)进行一步预测?我正在寻找一种基于适合训练集的模型来预测未来油价值的方法。
我很欣赏有关如何使用 hmmTMB 包实现此目标的任何见解或示例。
hmmTMB 目前没有内置预测功能(截至 2024 年 2 月)。然而,可以根据拟合的 hmmTMB 模型计算预测分布。
预测分布是状态相关分布的混合,其中权重对应于处于不同状态的概率。因此,您可以使用以下工作流程:
u
u %*% (tpm %^% h)
,其中tpm
是状态过程的(一步)转移概率矩阵;也就是说,我们将 u
乘以 h 步转移概率矩阵在步骤 3 中,您可以计算预测平均值(作为状态相关平均值的加权和),或预测分布的概率密度函数,如下面的代码所示。
黑点显示最后一次观察训练集后 60 天的预测分布平均值。
线条显示了最后一次观察训练集后 60 天的预测分布的概率密度函数。正如你所看到的,它很可能一开始处于状态2,因此相应的状态相关分布具有更高的权重。一段时间后,随着状态分布接近平稳(长期)分布,预测分布停止变化。
# Load packages
library(scico)
library(expm)
# Grid of oil prices to plot forecast distributions
grid <- seq(min(training_set$OILPRICE),
max(training_set$OILPRICE),
length = 100)
# Get state distribution at last (training) observation
sp <- hmm1$state_probs()
u <- sp[nrow(sp),]
# Get model parameters
tpm <- hid1$tpm()[,,1]
obspar <- obs1$par()[,,1]
# Loop over forecast times (over 60 days here)
h <- 1:60
mix <- list()
mix_mean <- rep(NA, length = length(h))
for(i in seq_along(h)) {
# State distribution at time n + h
dist_h <- u %*% (tpm %^% h[i])
# Forecast distribution at time n + h
mix[[i]] <-
dist_h[1] * dnorm(x = grid,
mean = obspar["OILPRICE.mean", "state 1"],
sd = obspar["OILPRICE.sd", "state 1"]) +
dist_h[2] * dnorm(x = grid,
mean = obspar["OILPRICE.mean", "state 2"],
sd = obspar["OILPRICE.sd", "state 2"])
# Mean of forecast distribution
mix_mean[i] <- sum(dist_h * obspar["OILPRICE.mean",])
}
# Plot the forecast mean against time
df_mean <- data.frame(day = seq.Date(from = as.Date("2020-10-20"),
by = "day", length = length(h)),
mean = mix_mean)
ggplot(data_plot, aes(day, OILPRICE, col = state)) +
geom_point() +
geom_point(aes(y = mean, col = NULL), data = df_mean) +
scale_color_manual(values = pal)
# Plot the forecast distributions for h = 1, 2, ..., 60
df_mix <- data.frame(h = rep(h, each = 100),
price = grid,
mix = unlist(mix))
ggplot(df_mix, aes(price, mix, col = h, group = h)) +
geom_line() +
scale_color_scico(name = "time") +
labs(y = "forecast distribution")