分析与边际效应包的三向交互

问题描述 投票:0回答:1

我正在分析 variable_A(连续)对variable_B(连续)、variable_C(连续)和variable_D(具有“旧”和“新”级别的分类)的响应。

dat <- structure(list(variable_A = c(-0.0295856251366527, -1.24738195367725, 
0.648945031694853, 0.580402160337185, 0.303715429301314, -0.553980060288217, 
-0.607948144698327, 2.71061895113126, -1.53757429143602, 0.129366769790572, 
0.16626208677442, -1.41291646270657, -0.200839313526748, 2.03483046591584, 
-0.112578482123941, -0.561740791028714, -0.272330872554233, 0.71323537344756, 
0.841184675731212, 0.707842765365241, 0.412738113173535, -0.700128662154531, 
-1.68043030412451, -1.53152621643461, 0.242993861464165, -0.588067091651035, 
-0.997892233042602, -1.58228732028551, -0.309374655641981, -0.891086439260152, 
0.560086614100886, -1.00551936564398, 0.216711500378051, -0.452286951494992, 
0.728493371811804, -0.720834439782271, 0.693795575916618, 0.0231649761405714, 
0.287283950920568, -0.201169463943841, -0.680455750771586, 0.433109625250988, 
1.23677460340502, -0.499702557524854, -1.17988096076043, -0.34533911717475, 
0.733757575575177, 0.803025086512173, -0.956314331656592, 2.0610849115608, 
1.26283438981272, -0.13524932927448, 1.18775411592947, -0.429116253594837, 
1.24239940798592, 2.30364888316214, 1.60875733136455, -0.567027215576018, 
0.129962056869789, -2.23963324058254, 0.481214590815243, 0.551147992003243, 
-0.410903943379136, -0.367157561647619, -0.995447457890487, 0.468528939223885, 
-0.295017486543682, 1.5471259341995, 0.149609004075832, -1.37356176484758, 
-0.530120009280847), variable_B = c(0.909414301702991, -0.943449516247587, 
0.781524506629484, -0.600911949819743, -1.00324860948739, -0.879024346862077, 
0.148702302376609, 1.03680921377107, -0.818036316970879, -1.35838203594376, 
0.716858807832897, -0.828797106744759, -0.668603665623089, 1.21332444520722, 
1.35303750102937, -0.35838533300562, 0.586314940239296, 0.814575853257065, 
0.789492242485147, 1.19281997662545, -0.910500570977607, 0.179361003357409, 
-1.20694554173343, -0.865556160009293, -1.18138003775792, -0.460239810467593, 
-0.347479702200378, -0.376211854380412, 0.695718104445771, 0.311847367815062, 
0.422743354183219, -1.31000649682009, 0.688080878453372, -2.6910231963726, 
0.959527136748011, 0.816023979122475, 2.47961675089516, 0.417574474989369, 
-0.784547963068128, 0.660568497111298, -1.17583280451095, 0.910685709373682, 
0.0893873399996857, -0.519872558390938, -0.371391026023119, 1.16814083225284, 
0.942464870838783, 1.08748559762075, -0.156271108557384, 1.03813800412845, 
0.912245187233877, -0.864089336414513, -0.519872558390938, 0.322936960762913, 
0.395796664489034, -0.0134883824090008, 0.880364781167251, -1.02201375308388, 
0.946978950240417, -0.182339630757281, 0.566561585615522, -0.663533281364679, 
0.732415185925742, -0.466304170570451, -0.672450145062805, -0.519872558390938, 
-3.00315063163361, 1.86543741543252, -0.954446227371451, -1.15998107435428, 
0.824664738419366), variable_C = c(0.419730106958754, 0.20723938151934, 
-0.923643462344663, -0.840808094800484, 2.1592728253865, 0.569193922310206, 
-0.756171958396649, -0.963260377257096, -1.0695057399768, -0.606708143045197, 
-0.541880464097579, 0.484557785906372, 1.88555595871878, -0.891229622870853, 
-0.887628085151541, -0.511267393483426, -0.76517580269493, -0.974064990415032, 
-0.756171958396649, -0.939850382081567, -0.554485846115171, -0.165519772429464, 
-0.502263549185146, -0.367205884710942, 1.34892683854128, 1.49658988503307, 
-0.540079695237923, -0.448240483395464, -0.891229622870853, 0.333293201695263, 
-0.808394255326675, -0.70575043032628, -0.864218089976013, 0.171224004326219, 
-0.945252688660535, -0.56889199699242, -0.414025875061999, -0.675137359712127, 
-0.385213573307502, -0.806593486467019, 0.313484744239047, -1.02628728734506, 
-0.224945144798113, -0.781382722431834, -0.619313525062789, -0.644524289097974, 
-1.08571265971371, -0.990271910151936, -0.594102761027605, 3.22712875916254, 
-0.550884308395859, 0.506167012222245, 0.913140774504513, -0.597704298746917, 
-0.269964366289515, -0.657129671115566, -0.884026547432229, -0.356401271553005, 
-1.04069343822231, -0.00705211277973086, -0.462646634272713, 
0.08838863678204, -0.981268065853656, -0.414025875061999, -0.716555043484216, 
0.194633999501747, -0.619313525062789, -0.907436542607758, -0.775980415852866, 
-0.289772823745731, -0.597704298746917), variable_D = structure(c(2L, 
1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 
1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 
1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 
1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 
1L, 1L, 1L, 1L, 1L, 2L), levels = c("old", "new"), class = "factor")), row.names = 97:167, class = "data.frame")

我假设变量_A 对变量_B 的响应取决于变量_C,并且这在变量_D 的级别之间有所不同。因此,我对所有变量的相互作用感兴趣。我安装了一个线性模型:

fit <- lm(variable_A ~ variable_B*variable_C*variable_D, data = dat)
summary(fit)

输出:

Call:
lm(formula = variable_A ~ variable_B * variable_C * variable_D, 
    data = dat)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.92931 -0.58884 -0.06014  0.56585  2.57492 

Coefficients:
                                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)                          0.18132    0.15653   1.158 0.251063    
variable_B                           0.51698    0.13378   3.865 0.000266 ***
variable_C                           0.07710    0.16378   0.471 0.639432    
variable_Dnew                       -0.42993    0.27514  -1.563 0.123157    
variable_B:variable_C                0.14710    0.16839   0.874 0.385679    
variable_B:variable_Dnew            -0.18376    0.33968  -0.541 0.590441    
variable_C:variable_Dnew            -0.05801    0.35809  -0.162 0.871824    
variable_B:variable_C:variable_Dnew -0.78161    0.39110  -1.999 0.049982 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.8798 on 63 degrees of freedom
Multiple R-squared:  0.3034,    Adjusted R-squared:  0.226 
F-statistic:  3.92 on 7 and 63 DF,  p-value: 0.001315

我想使用包

marginaeffects
来进一步研究重要的三向交互(
variable_B:variable_C:variable_Dnew
)。如何测试variable_A对variable_B的响应受variable_C影响的方向?我如何测试这在variable_D级别之间有何不同以及variable_D的影响对于哪些子集显着?

我设法绘制了模型预测,这给出了效应方向的初步印象。当变量_C 的值较高时(变量_C 用于分面),变量_B(在 x 轴上)和变量_A(在 y 轴上)之间的关系似乎与变量_D(用于着色)的“新”级别相反,但是不适合“旧”级别。我正在寻找一种通过统计支持这种可视化的方法,最好使用

marginal effects
包中的函数。

marginaleffects::plot_predictions(fit, condition = list("variable_B", "variable_D", "variable_C"))

enter image description here

r lm interaction marginal-effects r-marginaleffects
1个回答
0
投票

问题1:B对A的影响是否取决于C?

A
相对于
B
的斜率是治疗 B 的微小变化对结果 A 的“影响”的度量。使用
newdata
参数和
datagrid()
函数,您可以计算该斜率对于不同的
C
值。在这里,我们将
C
设置为其最小值和最大值:

library(marginaleffects)
slopes(fit, 
    variables = "variable_B", 
    newdata = datagrid(variable_C = range))
> 
>  variable_C Estimate Std. Error    z Pr(>|z|)   S    2.5 % 97.5 %
>       -1.09    0.357      0.181 1.98   0.0481 4.4  0.00304  0.712
>        3.23    0.992      0.607 1.63   0.1024 3.3 -0.19831  2.182

当C=-1.09时,C对A的“影响”等于0.357,当C=3.23时,C对A的“影响”等于0.992。这两个估计的效果是否不同?我们可以使用

hypothesis
参数进行检查:

slopes(fit, 
    hypothesis = "b2 - b1 = 0",
    variables = "variable_B", 
    newdata = datagrid(variable_C = range))
> 
>  Estimate Std. Error     z Pr(>|z|)   S  2.5 % 97.5 %
>     0.634      0.726 0.874    0.382 1.4 -0.789   2.06

两个斜率不同,但这种差异不具有统计显着性。

问题2:C的调节作用是否取决于D?

首先,计算我们关心的 C 和 D 的每个组合的斜率:

slopes(fit, 
    variables = "variable_B", 
    newdata = datagrid(variable_C = range, variable_D = unique))
> 
>  variable_C variable_D Estimate Std. Error     z Pr(>|z|)    S    2.5 % 97.5 %
>       -1.09        new    1.022      0.308  3.32   <0.001 10.1  0.41870  1.626
>       -1.09        old    0.357      0.181  1.98   0.0481  4.4  0.00304  0.712
>        3.23        new   -1.714      1.356 -1.26   0.2062  2.3 -4.37285  0.944
>        3.23        old    0.992      0.607  1.63   0.1024  3.3 -0.19831  2.182

然后,我们可以更进一步,计算一些差异,然后是差异中的差异:

# Difference in Effect of B on A, when C is high or low, holding D=new
slopes(fit, 
    hypothesis = "b3 - b1 = 0",
    variables = "variable_B", 
    newdata = datagrid(variable_C = range, variable_D = unique))
> 
>  Estimate Std. Error    z Pr(>|z|)   S 2.5 % 97.5 %
>     -2.74       1.52 -1.8   0.0723 3.8 -5.72  0.247

# Difference in Effect of B on A, when C is high or low, holding D=old
slopes(fit, 
    hypothesis = "b4 - b2 = 0",
    variables = "variable_B", 
    newdata = datagrid(variable_C = range, variable_D = unique))
> 
>  Estimate Std. Error     z Pr(>|z|)   S  2.5 % 97.5 %
>     0.634      0.726 0.874    0.382 1.4 -0.789   2.06

# Difference in differences
slopes(fit, 
    hypothesis = "(b4 - b2) - (b3 - b1) = 0",
    variables = "variable_B", 
    newdata = datagrid(variable_C = range, variable_D = unique))
> 
>  Estimate Std. Error z Pr(>|z|)   S 2.5 % 97.5 %
>      3.37       1.69 2   0.0457 4.5 0.065   6.68
> 
© www.soinside.com 2019 - 2024. All rights reserved.