Python 中与 Statsmodels 的交叉随机效应

问题描述 投票:0回答:1

考虑这个玩具数据集,用交叉随机效应进行模拟,然后用

lme4::lmer
:

拟合一个混合模型
set.seed(123)

# Number of subjects and items
num_subjects <- 30
num_items <- 20

# Generate subject and item IDs
subject <- factor(rep(1:num_subjects, each=num_items))
item <- factor(rep(1:num_items, num_subjects))

# Generate fixed effect predictor
x <- rnorm(num_subjects * num_items, mean=0, sd=1)

# Generate random intercepts for subjects and items
subject_intercepts <- rnorm(num_subjects, mean=0, sd=2)
item_intercepts <- rnorm(num_items, mean=0, sd=1)

# Create a data frame to hold the data
dt <- data.frame(subject, item, x)

# Add random intercepts to the data frame
dt$subject_intercept <- subject_intercepts[dt$subject]
dt$item_intercept <- item_intercepts[dt$item]

# Generate residual error
residual_error <- rnorm(num_subjects * num_items, mean=0, sd=1)

# Generate response variable y
dt$y <- 5 + 3 * dt$x + dt$subject_intercept + dt$item_intercept + residual_error

# Fit the linear mixed-effects model
model <- lmer(y ~ x + (1|subject) + (1|item), data=dt)

# Summarize the model

summary(model)

dt$subject_intercept <- NULL
dt$item_intercept <- NULL

write.csv(dt, "crossed_re_01.csv")

摘要输出的显着部分是:

Random effects:
 Groups   Name        Variance Std.Dev.
 subject  (Intercept) 4.818    2.195   
 item     (Intercept) 0.778    0.882   
 Residual             1.030    1.015   

现在我们在 Python 中使用相同的数据来拟合(我们希望的)相同的模型,使用

statsmodels
:

import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.regression.mixed_linear_model import MixedLM

data = pd.read_csv("crossed_re_01.csv")

# Fit the linear mixed-effects model with random intercepts for both subject and item
model = MixedLM.from_formula('y ~ x', data, re_formula='1', groups=data['subject'],
                             vc_formula={'item': '0 + C(item)'})
result = model.fit(reml=True, method='lbfgs')

# Print the summary of the model
print(result.summary())

# Extracting variance components
subject_var = result.cov_re.iloc[0, 0]  # Variance of subject random effect
item_var = result.vcomp[0]              # Variance of item random effect
residual_var = result.scale             # Residual variance

print("\nRandom Effects Variance Components:")
print(f"Subject Variance: {subject_var}")
print(f"Item Variance: {item_var}")

print("\nResidual Variance:")
print(f"Residual Variance: {residual_var}")

相关输出为:

---------------------------------------------------------
Intercept  4.526     0.403  11.225  0.000   3.736   5.316
x          2.943     0.058  50.820  0.000   2.829   3.056
Group Var  4.786                                         
item Var   0.443                                         
========================================================


Random Effects Variance Components:
Subject Variance: 4.786
Item Variance: 0.442

Residual Variance:
Residual Variance: 1.363

我预计方差分量会得到相当相似的结果。

Subject
随机截距非常一致,但
item
和残差方差明显不同。

我的下一步将是蒙特卡罗模拟,但在开始之前我想确保我在 Python 中做正确的事情?

python r statsmodels lme4 mixed-models
1个回答
0
投票

抱歉,不知道答案,但已经使用与您在 R 中使用的数据类似的数据创建了 python 示例。也许这会对 smn 有帮助。

import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.regression.mixed_linear_model import MixedLM
import numpy as np

np.random.seed(55)
num_subjects = 30
num_items = 20
size = 1 * num_subjects * num_items


# Generate subject and item IDs

feat_subj = np.tile(np.arange(num_subjects), int(size/num_subjects))
feat_items = np.tile(np.arange(num_items), int(size/num_items))


x =np.random.normal(0, 1, size=[size])
data = pd.DataFrame({'x': x,
              'subject': feat_subj,
              'item': feat_items,
              'school': 1})


subject_intercepts = np.random.normal(0,2, size=[num_subjects])
item_intercepts = np.random.normal(0,1,size=[num_items])
residual_error = np.random.normal(0,1,size=[size])


data['y'] = 5 + 3 * data['x'] + subject_intercepts[data['subject']] + item_intercepts[data['item']] + residual_error


# # Fit the linear mixed-effects model with random intercepts for both subject and item
model = MixedLM.from_formula('y ~ x', data, 
                             re_formula='1', 
                             groups=data['school'],
                             vc_formula={'item': "0 + C(item)",
                                         'subject': "0 + C(subject)" })

result = model.fit(reml=True, method='lbfgs')

# # # Print the summary of the model
print(result.summary())

© www.soinside.com 2019 - 2024. All rights reserved.