我在 R 中编写了一个数据生成函数,它模拟用于分层(多级)建模的数据。该函数生成具有相关截距和斜率的固定效应和随机效应。我不确定我是否已经在函数内正确实现了相关系数 (rho),表示随机截距和斜率之间的相关性。
我的函数如下所示:
data_generating_function <- function(n_classes, n_students, gamma00, gamma10, gamma01,
sd_intercept, sd_slope, rho_intercept_slope) {
data <- data.frame(
class_id = rep(1:n_classes, each = n_students),
x1ij = rnorm(n_classes * n_students, mean = 0, sd = 1), # Level 1 predictor
z1j = rep(rnorm(n_classes, mean = 0), each = n_students), # Level 2 predictor
yij = 1 # Placeholder, will be updated later
)
X <- model.matrix(~ x1ij + z1j, data) # fixed effects model matrix
myFormula <- "yij ~ x1ij + z1j + (x1ij|class_id)"
foo <- lFormula(eval(myFormula), data)
Z <- t(as.matrix(foo$reTrms$Zt)) # random effects design matrix
betas <- c(gamma00, gamma10, gamma01) # Fixed effects vector
# creating the random effects
random_effects_per_class <- mvrnorm(
n = n_classes,
mu = c(0, 0), # Zero mean for the random effects
Sigma = matrix(c(sd_intercept^2, rho_intercept_slope * sd_intercept * sd_slope,
rho_intercept_slope * sd_intercept * sd_slope, sd_slope^2),
nrow = 2, byrow = TRUE)
)
u <- as.vector(random_effects_per_class)
# random effects as vector
e <- rnorm(n_classes * n_students, mean = 0, sd = 1)
# error
data$yij <- X %*% betas + Z %*% u + e
# creating data
return(data)
}
根据我的理解,
rho_intercept_slope
的增加应该会导致截距和斜率相关性的增加。
data <- data_generating_function(n_classes = 10, n_students = 100, gamma00 = 100, gamma10 = 3, gamma01 = 1,
sd_intercept = 5, sd_slope = 2, rho_intercept_slope = 0.5)
model <- lmer(yij ~ x1ij + z1j + (x1ij|class_id), data = data)
summary(model)
但是,无论
rho_intercept_slope
的值是多少,Corr
的值始终在 0 左右。我还通过模拟检查了这一点,我运行了多次模拟。
Random effects:
Groups Name Variance Std.Dev. Corr
class_id (Intercept) 11.8927 3.4486
x1ij 11.3084 3.3628 0.02
Residual 0.9745 0.9872
Number of obs: 10000, groups: class_id, 100
关于如何解决这个问题有什么想法吗?
据我所知,问题在于您以错误的顺序解压缩了随机效应向量。我将相应的代码行更改为
u <- c(t(random_effects_per_class))
并得到了更明智的答案。无论如何,在我寻找错误之前,我制作了自己的数据生成函数,该函数使用内置的
simulate
方法......唯一棘手的部分是将标准差/相关参数转换为 lme4
的参数化
内部期望(这使用 lme4::Sv_to_Cv()
)...
dgf2 <- function(n_classes, n_students, gamma00, gamma10, gamma01,
sd_intercept, sd_slope, rho_intercept_slope) {
dd <- data.frame(
class_id = rep(1:n_classes, each = n_students),
x1ij = rnorm(n_classes * n_students, mean = 0, sd = 1), # Level 1 predictor
z1j = rep(rnorm(n_classes, mean = 0), each = n_students) # Level 2 pr
)
## convert sd/cor parameters to Cholesky factor
theta <- unname(Sv_to_Cv(c(sd_intercept, rho_intercept_slope, sd_slope)))
dd$yij <- simulate(~ x1ij + z1j + (x1ij|class_id),
newdata = dd,
newparams = list(beta = c(gamma00, gamma10, gamma01),
theta = theta,
sigma = 1))[[1]]
return(dd)
}
测试框架:
simfun <- function(dgf = data_generating_function) {
data <- dgf(n_classes = 10, n_students = 100,
gamma00 = 100, gamma10 = 3, gamma01 = 1,
sd_intercept = 5, sd_slope = 2,
rho_intercept_slope = 0.5)
model <- lmer(yij ~ x1ij + z1j + (x1ij|class_id), data = data)
cov2cor(VarCorr(model)[[1]])[1,2]
}
set.seed(101)
rho_vec <- replicate(500, simfun())
hist(rho_vec)
rho_vec2 <- replicate(500, simfun(dgf2))
hist(rho_vec2)