我正在分析 variable_A(连续)对variable_B(连续)、variable_C(连续)和variable_D(具有“旧”和“新”级别的分类)的响应。
dat <- structure(list(variable_A = c(-0.0295856251366527, -1.24738195367725,
0.648945031694853, 0.580402160337185, 0.303715429301314, -0.553980060288217,
-0.607948144698327, 2.71061895113126, -1.53757429143602, 0.129366769790572,
0.16626208677442, -1.41291646270657, -0.200839313526748, 2.03483046591584,
-0.112578482123941, -0.561740791028714, -0.272330872554233, 0.71323537344756,
0.841184675731212, 0.707842765365241, 0.412738113173535, -0.700128662154531,
-1.68043030412451, -1.53152621643461, 0.242993861464165, -0.588067091651035,
-0.997892233042602, -1.58228732028551, -0.309374655641981, -0.891086439260152,
0.560086614100886, -1.00551936564398, 0.216711500378051, -0.452286951494992,
0.728493371811804, -0.720834439782271, 0.693795575916618, 0.0231649761405714,
0.287283950920568, -0.201169463943841, -0.680455750771586, 0.433109625250988,
1.23677460340502, -0.499702557524854, -1.17988096076043, -0.34533911717475,
0.733757575575177, 0.803025086512173, -0.956314331656592, 2.0610849115608,
1.26283438981272, -0.13524932927448, 1.18775411592947, -0.429116253594837,
1.24239940798592, 2.30364888316214, 1.60875733136455, -0.567027215576018,
0.129962056869789, -2.23963324058254, 0.481214590815243, 0.551147992003243,
-0.410903943379136, -0.367157561647619, -0.995447457890487, 0.468528939223885,
-0.295017486543682, 1.5471259341995, 0.149609004075832, -1.37356176484758,
-0.530120009280847), variable_B = c(0.909414301702991, -0.943449516247587,
0.781524506629484, -0.600911949819743, -1.00324860948739, -0.879024346862077,
0.148702302376609, 1.03680921377107, -0.818036316970879, -1.35838203594376,
0.716858807832897, -0.828797106744759, -0.668603665623089, 1.21332444520722,
1.35303750102937, -0.35838533300562, 0.586314940239296, 0.814575853257065,
0.789492242485147, 1.19281997662545, -0.910500570977607, 0.179361003357409,
-1.20694554173343, -0.865556160009293, -1.18138003775792, -0.460239810467593,
-0.347479702200378, -0.376211854380412, 0.695718104445771, 0.311847367815062,
0.422743354183219, -1.31000649682009, 0.688080878453372, -2.6910231963726,
0.959527136748011, 0.816023979122475, 2.47961675089516, 0.417574474989369,
-0.784547963068128, 0.660568497111298, -1.17583280451095, 0.910685709373682,
0.0893873399996857, -0.519872558390938, -0.371391026023119, 1.16814083225284,
0.942464870838783, 1.08748559762075, -0.156271108557384, 1.03813800412845,
0.912245187233877, -0.864089336414513, -0.519872558390938, 0.322936960762913,
0.395796664489034, -0.0134883824090008, 0.880364781167251, -1.02201375308388,
0.946978950240417, -0.182339630757281, 0.566561585615522, -0.663533281364679,
0.732415185925742, -0.466304170570451, -0.672450145062805, -0.519872558390938,
-3.00315063163361, 1.86543741543252, -0.954446227371451, -1.15998107435428,
0.824664738419366), variable_C = c(0.419730106958754, 0.20723938151934,
-0.923643462344663, -0.840808094800484, 2.1592728253865, 0.569193922310206,
-0.756171958396649, -0.963260377257096, -1.0695057399768, -0.606708143045197,
-0.541880464097579, 0.484557785906372, 1.88555595871878, -0.891229622870853,
-0.887628085151541, -0.511267393483426, -0.76517580269493, -0.974064990415032,
-0.756171958396649, -0.939850382081567, -0.554485846115171, -0.165519772429464,
-0.502263549185146, -0.367205884710942, 1.34892683854128, 1.49658988503307,
-0.540079695237923, -0.448240483395464, -0.891229622870853, 0.333293201695263,
-0.808394255326675, -0.70575043032628, -0.864218089976013, 0.171224004326219,
-0.945252688660535, -0.56889199699242, -0.414025875061999, -0.675137359712127,
-0.385213573307502, -0.806593486467019, 0.313484744239047, -1.02628728734506,
-0.224945144798113, -0.781382722431834, -0.619313525062789, -0.644524289097974,
-1.08571265971371, -0.990271910151936, -0.594102761027605, 3.22712875916254,
-0.550884308395859, 0.506167012222245, 0.913140774504513, -0.597704298746917,
-0.269964366289515, -0.657129671115566, -0.884026547432229, -0.356401271553005,
-1.04069343822231, -0.00705211277973086, -0.462646634272713,
0.08838863678204, -0.981268065853656, -0.414025875061999, -0.716555043484216,
0.194633999501747, -0.619313525062789, -0.907436542607758, -0.775980415852866,
-0.289772823745731, -0.597704298746917), variable_D = structure(c(2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L,
1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L,
1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L,
1L, 1L, 1L, 1L, 1L, 2L), levels = c("old", "new"), class = "factor")), row.names = 97:167, class = "data.frame")
我假设变量_A 对变量_B 的响应取决于变量_C,并且这在变量_D 的级别之间有所不同。因此,我对所有变量的相互作用感兴趣。我安装了一个线性模型:
fit <- lm(variable_A ~ variable_B*variable_C*variable_D, data = dat)
summary(fit)
输出:
Call:
lm(formula = variable_A ~ variable_B * variable_C * variable_D,
data = dat)
Residuals:
Min 1Q Median 3Q Max
-1.92931 -0.58884 -0.06014 0.56585 2.57492
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.18132 0.15653 1.158 0.251063
variable_B 0.51698 0.13378 3.865 0.000266 ***
variable_C 0.07710 0.16378 0.471 0.639432
variable_Dnew -0.42993 0.27514 -1.563 0.123157
variable_B:variable_C 0.14710 0.16839 0.874 0.385679
variable_B:variable_Dnew -0.18376 0.33968 -0.541 0.590441
variable_C:variable_Dnew -0.05801 0.35809 -0.162 0.871824
variable_B:variable_C:variable_Dnew -0.78161 0.39110 -1.999 0.049982 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.8798 on 63 degrees of freedom
Multiple R-squared: 0.3034, Adjusted R-squared: 0.226
F-statistic: 3.92 on 7 and 63 DF, p-value: 0.001315
我想使用包
marginaeffects
来进一步研究重要的三向交互(variable_B:variable_C:variable_Dnew
)。如何测试variable_A对variable_B的响应受variable_C影响的方向?我如何测试这在variable_D级别之间有何不同以及variable_D的影响对于哪些子集显着?
我设法绘制了模型预测,这给出了效应方向的初步印象。当变量_C 的值较高时(变量_C 用于分面),变量_B(在 x 轴上)和变量_A(在 y 轴上)之间的关系似乎与变量_D(用于着色)的“新”级别相反,但是不适合“旧”级别。我正在寻找一种通过统计支持这种可视化的方法,最好使用
marginal effects
包中的函数。
marginaleffects::plot_predictions(fit, condition = list("variable_B", "variable_D", "variable_C"))
A
相对于B
的斜率是治疗 B 的微小变化对结果 A 的“影响”的度量。使用 newdata
参数和 datagrid()
函数,您可以计算该斜率对于不同的 C
值。在这里,我们将 C
设置为其最小值和最大值:
library(marginaleffects)
slopes(fit,
variables = "variable_B",
newdata = datagrid(variable_C = range))
>
> variable_C Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
> -1.09 0.357 0.181 1.98 0.0481 4.4 0.00304 0.712
> 3.23 0.992 0.607 1.63 0.1024 3.3 -0.19831 2.182
当C=-1.09时,C对A的“影响”等于0.357,当C=3.23时,C对A的“影响”等于0.992。这两个估计的效果是否不同?我们可以使用
hypothesis
参数进行检查:
slopes(fit,
hypothesis = "b2 - b1 = 0",
variables = "variable_B",
newdata = datagrid(variable_C = range))
>
> Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
> 0.634 0.726 0.874 0.382 1.4 -0.789 2.06
两个斜率不同,但这种差异不具有统计显着性。
首先,计算我们关心的 C 和 D 的每个组合的斜率:
slopes(fit,
variables = "variable_B",
newdata = datagrid(variable_C = range, variable_D = unique))
>
> variable_C variable_D Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
> -1.09 new 1.022 0.308 3.32 <0.001 10.1 0.41870 1.626
> -1.09 old 0.357 0.181 1.98 0.0481 4.4 0.00304 0.712
> 3.23 new -1.714 1.356 -1.26 0.2062 2.3 -4.37285 0.944
> 3.23 old 0.992 0.607 1.63 0.1024 3.3 -0.19831 2.182
然后,我们可以更进一步,计算一些差异,然后是差异中的差异:
# Difference in Effect of B on A, when C is high or low, holding D=new
slopes(fit,
hypothesis = "b3 - b1 = 0",
variables = "variable_B",
newdata = datagrid(variable_C = range, variable_D = unique))
>
> Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
> -2.74 1.52 -1.8 0.0723 3.8 -5.72 0.247
# Difference in Effect of B on A, when C is high or low, holding D=old
slopes(fit,
hypothesis = "b4 - b2 = 0",
variables = "variable_B",
newdata = datagrid(variable_C = range, variable_D = unique))
>
> Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
> 0.634 0.726 0.874 0.382 1.4 -0.789 2.06
# Difference in differences
slopes(fit,
hypothesis = "(b4 - b2) - (b3 - b1) = 0",
variables = "variable_B",
newdata = datagrid(variable_C = range, variable_D = unique))
>
> Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
> 3.37 1.69 2 0.0457 4.5 0.065 6.68
>