模拟混合模型数据的相关随机截距和斜率

问题描述 投票:0回答:1

我在 R 中编写了一个数据生成函数,它模拟用于分层(多级)建模的数据。该函数生成具有相关截距和斜率的固定效应和随机效应。我不确定我是否已经在函数内正确实现了相关系数 (rho),表示随机截距和斜率之间的相关性。

我的函数如下所示:

data_generating_function <- function(n_classes, n_students, gamma00, gamma10, gamma01, 
                                     sd_intercept, sd_slope, rho_intercept_slope) {
  data <- data.frame(
    class_id = rep(1:n_classes, each = n_students),
    x1ij = rnorm(n_classes * n_students, mean = 0, sd = 1), # Level 1 predictor
    z1j = rep(rnorm(n_classes, mean  = 0), each = n_students),  # Level 2 predictor
    yij = 1  # Placeholder, will be updated later
  )
  
  X <- model.matrix(~ x1ij + z1j, data)  # fixed effects model matrix
  myFormula <- "yij ~ x1ij + z1j + (x1ij|class_id)"
  foo <- lFormula(eval(myFormula), data)
  Z <- t(as.matrix(foo$reTrms$Zt))  # random effects design matrix
  
  betas <- c(gamma00, gamma10, gamma01)  # Fixed effects vector 
  

# creating the random effects
  random_effects_per_class <- mvrnorm( 
    n = n_classes, 
    mu = c(0, 0), # Zero mean for the random effects
    Sigma = matrix(c(sd_intercept^2, rho_intercept_slope * sd_intercept * sd_slope, 
                     rho_intercept_slope * sd_intercept * sd_slope, sd_slope^2), 
                   nrow = 2, byrow = TRUE)
  )
  


  u <- as.vector(random_effects_per_class)
# random effects as vector
  
  e <- rnorm(n_classes * n_students, mean = 0, sd = 1)
# error
  
  data$yij <- X %*% betas + Z %*% u + e
# creating data

  
  return(data)
}

根据我的理解,

rho_intercept_slope
的增加应该会导致截距和斜率相关性的增加。

data <- data_generating_function(n_classes = 10, n_students = 100, gamma00 = 100, gamma10 = 3, gamma01 = 1, 
                                 sd_intercept = 5, sd_slope = 2, rho_intercept_slope = 0.5)


model <- lmer(yij ~ x1ij + z1j + (x1ij|class_id), data = data)
summary(model)

但是,无论

rho_intercept_slope
的值是多少,
Corr
的值始终在 0 左右。我还通过模拟检查了这一点,我运行了多次模拟。

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 class_id (Intercept) 11.8927  3.4486       
          x1ij        11.3084  3.3628   0.02
 Residual              0.9745  0.9872       
Number of obs: 10000, groups:  class_id, 100

关于如何解决这个问题有什么想法吗?

r simulation mixed-models
1个回答
0
投票

据我所知,问题在于您以错误的顺序解压缩了随机效应向量。我将相应的代码行更改为

u <- c(t(random_effects_per_class))

并得到了更明智的答案。无论如何,在我寻找错误之前,我制作了自己的数据生成函数,该函数使用内置的

simulate
方法......唯一棘手的部分是将标准差/相关参数转换为
lme4 的参数化
内部期望(这使用
lme4::Sv_to_Cv()
)...

dgf2 <- function(n_classes, n_students, gamma00, gamma10, gamma01, 
                    sd_intercept, sd_slope, rho_intercept_slope) {
    
    dd <- data.frame(
        class_id = rep(1:n_classes, each = n_students),
        x1ij = rnorm(n_classes * n_students, mean = 0, sd = 1), # Level 1 predictor
        z1j = rep(rnorm(n_classes, mean  = 0), each = n_students)  # Level 2 pr
    )
    ## convert sd/cor parameters to Cholesky factor
    theta <- unname(Sv_to_Cv(c(sd_intercept, rho_intercept_slope, sd_slope)))
    dd$yij <- simulate(~ x1ij + z1j + (x1ij|class_id),
                         newdata = dd,
                         newparams =  list(beta = c(gamma00, gamma10, gamma01),
                                           theta = theta,
                                           sigma = 1))[[1]]
    return(dd)
}

测试框架:

simfun <- function(dgf = data_generating_function) {
    data <- dgf(n_classes = 10, n_students = 100,
                gamma00 = 100, gamma10 = 3, gamma01 = 1, 
                sd_intercept = 5, sd_slope = 2,
                rho_intercept_slope = 0.5)
    model <- lmer(yij ~ x1ij + z1j + (x1ij|class_id), data = data)
    cov2cor(VarCorr(model)[[1]])[1,2]
}
set.seed(101)
rho_vec <- replicate(500, simfun())
hist(rho_vec)
rho_vec2 <- replicate(500, simfun(dgf2))
hist(rho_vec2)
© www.soinside.com 2019 - 2024. All rights reserved.