我在 R 中使用 glmer(来自 lme4)运行了 GLMM。与估计的边际均值相比,固定效应估计与预期非常不同(小得多)。
GLMM 输出
`Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) \['glmerMod'\]
Family: inverse.gaussian ( identity )
Formula: latency \~ Con_shape \* Cue + (1 + Con_shape | subject)
Data: SN_P_PMT_match_data_rt
Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))
AIC BIC logLik deviance df.resid
81984.2 82071.9 -40979.1 81958.2 6269
Scaled residuals:
Min 1Q Median 3Q Max
\-3.1154 -0.6553 -0.1732 0.4628 4.9629
Random effects:
Groups Name Variance Std.Dev. Corr
subject (Intercept) 1.718e+03 41.44726
Con_shapeS vs. F 4.076e+03 63.84257 0.02
Con_shapeS vs. Str 4.896e+03 69.97401 0.18 -0.57
Residual 7.191e-05 0.00848
Number of obs: 6282, groups: subject, 42
Fixed effects:
Estimate Std. Error t value Pr(\>|z|)
(Intercept) 796.370 14.132 56.354 \< 2e-16 \*\*\*
Con_shapeS vs. F 16.932 13.337 1.270 0.204
Con_shapeS vs. Str 95.656 15.344 6.234 4.55e-10 \*\*\*
CueSad vs Neu -3.186 4.045 -0.788 0.431
Con_shapeS vs. F:CueSad vs Neu -7.216 10.676 -0.676 0.499
Con_shapeS vs. Str:CueSad vs Neu -1.303 11.727 -0.111 0.911
-
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) C_Sv.F C_Sv.S CSdvsN C_vFvN
Cn_shpSvs.F 0.088
Cn_shpSvs.S 0.028 -0.271
CueSadvsNeu -0.014 0.016 0.001
C_Sv.F:CSvN -0.096 -0.099 -0.055 0.019
C_Sv.S:CSvN 0.061 0.129 0.001 0.130 -0.548`
EMM
`Con_shape Cue emmean SE df asymp.LCL asymp.UCL
self sad 728.7943 13.14691 Inf 703.0269 754.5618
friend sad 785.5311 13.31949 Inf 759.4254 811.6369
stranger sad 811.8101 13.48155 Inf 785.3868 838.2334
self neutral 733.1200 13.98504 Inf 705.7098 760.5302
friend neutral 781.1784 14.08376 Inf 753.5748 808.7821
stranger neutral 810.0302 14.53561 Inf 781.5409 838.5194`
因此我想知道是什么导致了这种差异?变量在 R 中的编码如下:
#Contrast coding - categorical variables (non-orthogonal)#
`contrasts(data$Con_shape) <- cbind("S vs. F" = c(-.5,.5,0),
"S vs. Str" = c(-.5,0,.5))
S vs. F S vs. Str
self -0.5 -0.5
friend 0.5 0.0
stranger 0.0 0.5
contrasts(data$Cue) <- cbind("Sad vs Neu" = c(-.5,.5))
Sad vs Neu
sad -0.5
neutral 0.5`
这是 GLMM 代码:
`#Run glmer
model <- glmer(latency ~ Con_shape*Cue +
(1+ Con_shape|subject),
data=data,
family=inverse.gaussian(link="identity"),
control=glmerControl
(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))`
我想知道差异是否来自使用缩放对比度编码而不是未缩放(-1, 1, 0)?我尝试使用未缩放的对比度编码,这导致了在查看 EMM 绘图时所预期的显着效果,但系数又非常不同(加上奇点问题)。
未缩放的对比度
contrasts(SN_P_PMT_data$Con_shape) <- cbind("S vs. F" = c(-1,1,0),
"S vs. Str" = c(-1,0,1))
contrasts(SN_P_PMT_data$Cue) <- cbind("Sad vs Neu" = c(-1,1))
GLMM 输出
`Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: inverse.gaussian ( identity )
Formula: latency ~ 1 + Con_shape * Cue + (1 + Con_shape | subject)
Data: SN_P_PMT_match_data_rt
Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))
AIC BIC logLik deviance df.resid
82171.8 82259.5 -41072.9 82145.8 6269
Scaled residuals:
Min 1Q Median 3Q Max
-3.0716 -0.6663 -0.1754 0.4809 5.3294
Random effects:
Groups Name Variance Std.Dev. Corr
subject (Intercept) 1.734e+03 41.642073
Con_shapeS vs. F 5.894e+00 2.427823 1.00
Con_shapeS vs. Str 8.270e+02 28.758130 0.19 0.19
Residual 7.405e-05 0.008605
Number of obs: 6282, groups: subject, 42
Fixed effects:
Estimate Std. Error t value Pr(>|z|)
(Intercept) 786.600 13.085 60.113 < 2e-16 ***
Con_shapeS vs. F 10.198 3.057 3.336 0.000851 ***
Con_shapeS vs. Str 43.327 8.685 4.989 6.07e-07 ***
CueSad vs Neu -1.804 2.118 -0.852 0.394476
Con_shapeS vs. F:CueSad vs Neu -1.788 2.972 -0.602 0.547312
Con_shapeS vs. Str:CueSad vs Neu -0.504 3.137 -0.161 0.872377
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) C_Sv.F C_Sv.S CSdvsN C_vFvN
Cn_shpSvs.F 0.220
Cn_shpSvs.S 0.127 -0.150
CueSadvsNeu -0.004 0.001 -0.007
C_Sv.F:CSvN 0.001 -0.009 0.013 0.009
C_Sv.S:CSvN 0.013 0.012 -0.006 0.150 -0.574
optimizer (bobyqa) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')`
也就是说,我想说,除非模型中只有一个因素,否则回归系数的解释很复杂。
在过去,人们做出了一个非常不幸的选择,即使用“对比”一词来创建因子虚拟变量的方法。通常(但并非总是),回归系数确实包含对比度的估计,但并不总是似乎由编码暗示的估计,因为虚拟编码和系数之间存在反比关系。在某些正交编码中,您正在估计可比较的对比度,但它们的缩放比例不同。
但正如我一开始所说的,你真的不需要费力地穿过那些杂草。如果您理解并信任边际均值,它们将比回归系数给出更清晰的解释。