考虑这个玩具数据集,用交叉随机效应进行模拟,然后用
lme4::lmer
: 拟合一个混合模型
set.seed(123)
# Number of subjects and items
num_subjects <- 30
num_items <- 20
# Generate subject and item IDs
subject <- factor(rep(1:num_subjects, each=num_items))
item <- factor(rep(1:num_items, num_subjects))
# Generate fixed effect predictor
x <- rnorm(num_subjects * num_items, mean=0, sd=1)
# Generate random intercepts for subjects and items
subject_intercepts <- rnorm(num_subjects, mean=0, sd=2)
item_intercepts <- rnorm(num_items, mean=0, sd=1)
# Create a data frame to hold the data
dt <- data.frame(subject, item, x)
# Add random intercepts to the data frame
dt$subject_intercept <- subject_intercepts[dt$subject]
dt$item_intercept <- item_intercepts[dt$item]
# Generate residual error
residual_error <- rnorm(num_subjects * num_items, mean=0, sd=1)
# Generate response variable y
dt$y <- 5 + 3 * dt$x + dt$subject_intercept + dt$item_intercept + residual_error
# Fit the linear mixed-effects model
model <- lmer(y ~ x + (1|subject) + (1|item), data=dt)
# Summarize the model
summary(model)
dt$subject_intercept <- NULL
dt$item_intercept <- NULL
write.csv(dt, "crossed_re_01.csv")
摘要输出的显着部分是:
Random effects:
Groups Name Variance Std.Dev.
subject (Intercept) 4.818 2.195
item (Intercept) 0.778 0.882
Residual 1.030 1.015
现在我们在 Python 中使用相同的数据来拟合(我们希望的)相同的模型,使用
statsmodels
:
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.regression.mixed_linear_model import MixedLM
data = pd.read_csv("crossed_re_01.csv")
# Fit the linear mixed-effects model with random intercepts for both subject and item
model = MixedLM.from_formula('y ~ x', data, re_formula='1', groups=data['subject'],
vc_formula={'item': '0 + C(item)'})
result = model.fit(reml=True, method='lbfgs')
# Print the summary of the model
print(result.summary())
# Extracting variance components
subject_var = result.cov_re.iloc[0, 0] # Variance of subject random effect
item_var = result.vcomp[0] # Variance of item random effect
residual_var = result.scale # Residual variance
print("\nRandom Effects Variance Components:")
print(f"Subject Variance: {subject_var}")
print(f"Item Variance: {item_var}")
print("\nResidual Variance:")
print(f"Residual Variance: {residual_var}")
相关输出为:
---------------------------------------------------------
Intercept 4.526 0.403 11.225 0.000 3.736 5.316
x 2.943 0.058 50.820 0.000 2.829 3.056
Group Var 4.786
item Var 0.443
========================================================
Random Effects Variance Components:
Subject Variance: 4.786
Item Variance: 0.442
Residual Variance:
Residual Variance: 1.363
我预计方差分量会得到相当相似的结果。
Subject
随机截距非常一致,但 item
和残差方差明显不同。
我的下一步将是蒙特卡罗模拟,但在开始之前我想确保我在 Python 中做正确的事情?
抱歉,不知道答案,但已经使用与您在 R 中使用的数据类似的数据创建了 python 示例。也许这会对 smn 有帮助。
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.regression.mixed_linear_model import MixedLM
import numpy as np
np.random.seed(55)
num_subjects = 30
num_items = 20
size = 1 * num_subjects * num_items
# Generate subject and item IDs
feat_subj = np.tile(np.arange(num_subjects), int(size/num_subjects))
feat_items = np.tile(np.arange(num_items), int(size/num_items))
x =np.random.normal(0, 1, size=[size])
data = pd.DataFrame({'x': x,
'subject': feat_subj,
'item': feat_items,
'school': 1})
subject_intercepts = np.random.normal(0,2, size=[num_subjects])
item_intercepts = np.random.normal(0,1,size=[num_items])
residual_error = np.random.normal(0,1,size=[size])
data['y'] = 5 + 3 * data['x'] + subject_intercepts[data['subject']] + item_intercepts[data['item']] + residual_error
# # Fit the linear mixed-effects model with random intercepts for both subject and item
model = MixedLM.from_formula('y ~ x', data,
re_formula='1',
groups=data['school'],
vc_formula={'item': "0 + C(item)",
'subject': "0 + C(subject)" })
result = model.fit(reml=True, method='lbfgs')
# # # Print the summary of the model
print(result.summary())