我的分类协变量(种族,有6个类别)很重要。我知道,Lm测试第一个类别(1)与其他类别(2,3,4,5,6)。我交换了第六类和第一类,以观察第一类的影响,但是它变得微不足道了。
我错了什么?我想知道种族因素对我的回归的影响,但希望并非如此,重要性取决于项目代码的顺序。
x=ForkinYak
##Fixed Effects
##Covariates
CoAge = x$Age
CoVPSex = factor(x$Gender, levels = c(1,2,3))
CoEdu = factor(x$Education, levels = c(1,2,3,4,5,6))
CoCDoc = x$Frequency
CoEth = factor(x$Ethnicity, levels = c(1,2,3,4,5))
CoPrefAlt = factor(x$Alt_Code)
CoPref = factor(x$Code)
CoEthSwapWhiteOthers = factor(x$WhiteEthnicity, levels = c(1,2,3,4,5))
Pos= factor(x$Posture)
Sex= factor(x$Sex)
contrasts(Pos) <- -1*contr.sum(2)
contrasts(Sex) <- -1*contr.sum(2)
model <- lm(Rating ~ Pos*Sex + CoEth , data = x)
summary(model)
###Results
> model <- lm(Rating ~ Pos*Sex + CoEth , data = x)
> summary(model)
Call:
lm(formula = Rating ~ Pos * Sex + CoEth, data = x)
Residuals:
Min 1Q Median 3Q Max
-2.8534 -0.9356 0.1288 1.1599 2.6399
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.52145 0.17994 19.570 < 2e-16 ***
Pos1 0.16138 0.15689 1.029 0.305232
SexM 0.24233 0.22481 1.078 0.282709
CoEth2 1.63913 0.45748 3.583 0.000451 ***
CoEth3 0.90006 0.55872 1.611 0.109178
CoEth4 1.17054 0.24559 4.766 4.21e-06 ***
CoEth5 0.12875 1.02912 0.125 0.900597
Pos1:SexM -0.05391 0.22520 -0.239 0.811120
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.416 on 159 degrees of freedom
Multiple R-squared: 0.1867, Adjusted R-squared: 0.1509
F-statistic: 5.216 on 7 and 159 DF, p-value: 2.257e-05
model <- lm(Rating ~ Pos*Sex + CoEthSwapWhiteOthers , data = x)
summary(model)
####Results, when Codes of 1 and 6 are swapped
> model <- lm(Rating ~ Pos*Sex + CoEthSwapWhiteOthers , data = x)
> summary(model)
Call:
lm(formula = Rating ~ Pos * Sex + CoEthSwapWhiteOthers, data = x)
Residuals:
Min 1Q Median 3Q Max
-2.8534 -0.9356 0.1288 1.1599 2.6399
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.65020 1.03527 3.526 0.000552 ***
Pos1 0.16138 0.15689 1.029 0.305232
SexM 0.24233 0.22481 1.078 0.282709
CoEthSwapWhiteOthers2 1.51038 1.09425 1.380 0.169438
CoEthSwapWhiteOthers3 0.77131 1.14505 0.674 0.501540
CoEthSwapWhiteOthers4 1.04179 1.03651 1.005 0.316379
CoEthSwapWhiteOthers5 -0.12875 1.02912 -0.125 0.900597
Pos1:SexM -0.05391 0.22520 -0.239 0.811120
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.416 on 159 degrees of freedom
Multiple R-squared: 0.1867, Adjusted R-squared: 0.1509
F-statistic: 5.216 on 7 and 159 DF, p-value: 2.257e-05
DATA
# first 20 rows
structure(list(Posture = c("Closed", "Closed", "Closed", "Closed",
"Closed", "Closed", "Closed", "Closed", "Closed", "Closed", "Closed",
"Closed", "Closed", "Closed", "Closed", "Closed", "Closed", "Closed",
"Closed", "Closed"), Sex = c("M", "M", "M", "M", "M", "M", "M",
"M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M"
), Rating = c(5, 5, 4, 2, 5, 6, 4, 4, 3, 5, 3, 6, 6, 5, 4, 4,
4, 3, 2, 1), Ethnicity = c(1, 1, 4, 4, 1, 4, 1, 1, 1, 1, 4, 1,
4, 2, 1, 1, 1, 1, 1, 1), WhiteEthnicity = c(5, 5, 4, 4, 5, 4,
5, 5, 5, 5, 4, 5, 4, 2, 5, 5, 5, 5, 5, 5)), row.names = c(NA,
-20L), class = c("tbl_df", "tbl", "data.frame"))
如果您对类别重新排序,则您的模型不会改变,只会计算估计和统计显着性的特定对比。 (请注意,Pos1
系数的估计值在两个模型之间完全不变)。
[如果一个组(例如第6组)与其他组之间没有显着差异,但是这些组(第1组和第4组)之间存在差异,则第6组中可能没有足够的样本表示与1或4不同。
您的两个模型都没有专门测试'种族'是否对结果变量产生重大影响。要对此进行测试,您可以将包含种族的模型与没有种族的模型进行比较,以检查适合度的提高。例如:
model <- lm(Rating ~ Pos*Sex + CoEth , data = x)
model2 <- lm(Rating ~ Pos*Sex, data = x)
anova(model, model2)
您应该看到相同的结果,无论协变量级别位于哪个顺序。但是summary.lm
输出中的各个对比将有所不同。
最后,您可以使用emmeans
包从模型中查看不同的对比,而无需手动交换因子水平。例如:
pairs(emmeans::emmeans(model, "CoEth"))