使用强制通过某个点的惩罚三次样条来生成 GAM (mgcv) 的预测

问题描述 投票:0回答:1

我正在使用 MGCV 使用强制通过某个点的三次样条来生成 GAM 的预测。我已经按照这篇文章中的概述生成了游戏:https://stat.ethz.ch/pipermail/r-help/2013-March/350253.html

但是,当我尝试预测 gam 范围内 x 的新值的 y 时,我收到此错误:

Error: variable 'X' was fitted with type "nmatrix.8" but type "numeric" was supplied.

这是我用来尝试做出预测的脚本:

  1. 来自原帖
## Fake some data...

library(mgcv)
set.seed(0)
n <- 100
x <- runif(n)*4-1;x <- sort(x);
f <- exp(4*x)/(1+exp(4*x));y <- f+rnorm(100)*0.1;plot(x,y)
dat <- data.frame(x=x,y=y)

## Create a spline basis and penalty, making sure there is a knot
## at the constraint point, (0 here, but could be anywhere)
knots <- data.frame(x=seq(-1,3,length=9)) ## create knots
## set up smoother...
sm <- smoothCon(s(x,k=9,bs="cr"),dat,knots=knots)[[1]]

## 3rd parameter is value of spline at knot location 0,
## set it to 0 by dropping...
X <- sm$X[,-3]        ## spline basis
S <- sm$S[[1]][-3,-3] ## spline penalty
off <- y*0 + .6       ## offset term to force curve through (0, .6)

## fit spline constrained through (0, .6)...
b <- gam(y ~ X - 1 + offset(off),paraPen=list(X=list(S)))
lines(x,predict(b))
  1. 预测:
library(tidyverse)
  
 # Predict values across concentrations within measured range
  x_seq <- seq(-1, 3, length.out = 10000)
  newdata <- data.frame(X = x_seq)

  # Generate the spline basis matrix for the new data
  sm_new <- smoothCon(s(X, k=9, bs="cr"), newdata, knots=knots)[[1]]
  X_new <- sm_new$X[,-3]  # Match the basis matrix structure used in the model
  
  # Create a data frame with the spline basis matrix
  X_new_df <- as.data.frame(X_new)
    
  # Predict the offset term for the new data
  off_new <- y*0 + .6 # Use the same offset as in the original model
  # Combine the new data with the basis matrix and the offset
  predict_data <- cbind(newdata, X_new_df, off_new) %>% 
    rename(off = off_new)

  # Predict values using the GAM
  predictions <- predict(b, newdata = predict_data, se.fit = TRUE)

我还尝试将函数所需的变量放入列表中,以便可以将它们作为“数据”的参数调用,但这会产生一个单独的问题,其中矩阵(X)无法识别:

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  :  arguments imply differing number of rows: 27, 8 

这是我用来构建此模型的调整后的脚本,它会生成错误:

list.dat <- list(y = y, 
                   X = X, 
                   S = S, 
                   off = off)

b <- gam(y ~ X - 1 + offset(off), 
                     data = list.dat, paraPen=list(bigX=list(S)))

你能帮我根据比原始数据更多的 x 值生成 y 的预测吗?我更愿意将我的变量作为数据参数中的列表提供给模型,但我也愿意接受其他建议。

非常感谢您的建议! :)

predict spline gam mgcv
1个回答
0
投票

您现在可以通过其

s()
参数直接使用
pc
执行此操作(对于“点约束”):

library(mgcv)
set.seed(0)
n <- 100
x <- runif(n) * 4 - 1
x <- sort(x)
f <- exp(4 * x) / (1 + exp(4 * x))
y <- f + rnorm(100) * 0.1
dat <- data.frame(x = x, y = y)

m <- gam(y ~ s(x, pc = 0), data = dat, method = "REML")

然后

predict()
等按预期工作:

library("gratia")

ds <- data_slice(m, x = evenly(x))
fv <- fitted_values(m, data = ds) # calls predict
fv
# A tibble: 100 × 6
    .row      x .fitted    .se .lower_ci .upper_ci
   <int>  <dbl>   <dbl>  <dbl>     <dbl>     <dbl>
 1     1 -0.946  0.0364 0.0526  -0.0666      0.140
 2     2 -0.907  0.0375 0.0463  -0.0534      0.128
 3     3 -0.867  0.0386 0.0406  -0.0410      0.118
 4     4 -0.828  0.0402 0.0358  -0.0299      0.110
 5     5 -0.788  0.0424 0.0319  -0.0201      0.105
 6     6 -0.749  0.0455 0.0292  -0.0118      0.103
 7     7 -0.709  0.0497 0.0277  -0.00467     0.104
 8     8 -0.670  0.0554 0.0272   0.00202     0.109
 9     9 -0.630  0.0627 0.0273   0.00923     0.116
10    10 -0.591  0.0720 0.0276   0.0179      0.126
# ℹ 90 more rows
# ℹ Use `print(n = ...)` to see more rows

我们可以通过平滑图确认点约束:

draw(m) # or plot(m)

plot of the estimated smooth, showing the smooth passing through y == 0 at the value x == 0

© www.soinside.com 2019 - 2024. All rights reserved.