如何解释线性模型中的分类协变量

问题描述 投票:0回答:1

我的分类协变量(种族,有6个类别)很重要。我知道,Lm测试第一个类别(1)与其他类别(2,3,4,5,6)。我交换了第六类和第一类,以观察第一类的影响,但是它变得微不足道了。

我错了什么?我想知道种族因素对我的回归的影响,但希望并非如此,重要性取决于项目代码的顺序。

x=ForkinYak


##Fixed Effects
##Covariates

CoAge = x$Age
CoVPSex = factor(x$Gender, levels = c(1,2,3))
CoEdu = factor(x$Education, levels = c(1,2,3,4,5,6))
CoCDoc = x$Frequency
CoEth = factor(x$Ethnicity, levels = c(1,2,3,4,5))
CoPrefAlt = factor(x$Alt_Code)
CoPref = factor(x$Code)
CoEthSwapWhiteOthers = factor(x$WhiteEthnicity, levels = c(1,2,3,4,5))


Pos= factor(x$Posture)
Sex= factor(x$Sex)

contrasts(Pos) <- -1*contr.sum(2)
contrasts(Sex) <- -1*contr.sum(2)

model <- lm(Rating ~ Pos*Sex +  CoEth , data = x)
summary(model)



###Results
> model <- lm(Rating ~ Pos*Sex +  CoEth , data = x)
> summary(model)

Call:
lm(formula = Rating ~ Pos * Sex + CoEth, data = x)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.8534 -0.9356  0.1288  1.1599  2.6399 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.52145    0.17994  19.570  < 2e-16 ***
Pos1         0.16138    0.15689   1.029 0.305232    
SexM         0.24233    0.22481   1.078 0.282709    
CoEth2       1.63913    0.45748   3.583 0.000451 ***
CoEth3       0.90006    0.55872   1.611 0.109178    
CoEth4       1.17054    0.24559   4.766 4.21e-06 ***
CoEth5       0.12875    1.02912   0.125 0.900597    
Pos1:SexM   -0.05391    0.22520  -0.239 0.811120    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.416 on 159 degrees of freedom
Multiple R-squared:  0.1867,    Adjusted R-squared:  0.1509 
F-statistic: 5.216 on 7 and 159 DF,  p-value: 2.257e-05


model <- lm(Rating ~ Pos*Sex +  CoEthSwapWhiteOthers , data = x)
summary(model)


####Results, when Codes of 1 and 6 are swapped
> model <- lm(Rating ~ Pos*Sex +  CoEthSwapWhiteOthers , data = x)
> summary(model)

Call:
lm(formula = Rating ~ Pos * Sex + CoEthSwapWhiteOthers, data = x)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.8534 -0.9356  0.1288  1.1599  2.6399 

Coefficients:
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)            3.65020    1.03527   3.526 0.000552 ***
Pos1                   0.16138    0.15689   1.029 0.305232    
SexM                   0.24233    0.22481   1.078 0.282709    
CoEthSwapWhiteOthers2  1.51038    1.09425   1.380 0.169438    
CoEthSwapWhiteOthers3  0.77131    1.14505   0.674 0.501540    
CoEthSwapWhiteOthers4  1.04179    1.03651   1.005 0.316379    
CoEthSwapWhiteOthers5 -0.12875    1.02912  -0.125 0.900597    
Pos1:SexM             -0.05391    0.22520  -0.239 0.811120    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.416 on 159 degrees of freedom
Multiple R-squared:  0.1867,    Adjusted R-squared:  0.1509 
F-statistic: 5.216 on 7 and 159 DF,  p-value: 2.257e-05

DATA

# first 20 rows

structure(list(Posture = c("Closed", "Closed", "Closed", "Closed", 
"Closed", "Closed", "Closed", "Closed", "Closed", "Closed", "Closed", 
"Closed", "Closed", "Closed", "Closed", "Closed", "Closed", "Closed", 
"Closed", "Closed"), Sex = c("M", "M", "M", "M", "M", "M", "M", 
"M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M"
), Rating = c(5, 5, 4, 2, 5, 6, 4, 4, 3, 5, 3, 6, 6, 5, 4, 4, 
4, 3, 2, 1), Ethnicity = c(1, 1, 4, 4, 1, 4, 1, 1, 1, 1, 4, 1, 
4, 2, 1, 1, 1, 1, 1, 1), WhiteEthnicity = c(5, 5, 4, 4, 5, 4, 
5, 5, 5, 5, 4, 5, 4, 2, 5, 5, 5, 5, 5, 5)), row.names = c(NA, 
-20L), class = c("tbl_df", "tbl", "data.frame"))
r linear-regression
1个回答
0
投票

如果您对类别重新排序,则您的模型不会改变,只会计算估计和统计显着性的特定对比。 (请注意,Pos1系数的估计值在两个模型之间完全不变)。

[如果一个组(例如第6组)与其他组之间没有显着差异,但是这些组(第1组和第4组)之间存在差异,则第6组中可能没有足够的样本表示与1或4不同。

您的两个模型都没有专门测试'种族'是否对结果变量产生重大影响。要对此进行测试,您可以将包含种族的模型与没有种族的模型进行比较,以检查适合度的提高。例如:

model <- lm(Rating ~ Pos*Sex +  CoEth , data = x)
model2 <- lm(Rating ~ Pos*Sex, data = x)
anova(model, model2)

您应该看到相同的结果,无论协变量级别位于哪个顺序。但是summary.lm输出中的各个对比将有所不同。

最后,您可以使用emmeans包从模型中查看不同的对比,而无需手动交换因子水平。例如:

pairs(emmeans::emmeans(model, "CoEth"))
© www.soinside.com 2019 - 2024. All rights reserved.