背景和我的尝试
假设我有一个模型,有 2 个数字协变量和 1 个分类协变量,其中 3 个 响应的因素水平:
set.seed(1)
Y <- sample(100)
n <- 100
X1 <- sample(n)
X2 <- sample(n)
X3 <- as.factor(rep(c("A", "B", "C", "D"), n/4))
model <- lm(Y ~ X1 + X2 + X3)
如果我使用
anova
函数,我会得到以下结果:
> anova(model)
Analysis of Variance Table
Response: Y
Df Sum Sq Mean Sq F value Pr(>F)
X1 1 1029 1028.85 1.1896 0.2782
X2 1 645 645.41 0.7462 0.3899
X3 3 351 116.87 0.1351 0.9389
Residuals 94 81300 864.89
我还可以使用
aov
和 summary
函数来获得类似的结果:
model_aov <- aov(Y ~ X1 + X2 + X3)
> anova(model_aov)
Analysis of Variance Table
Response: Y
Df Sum Sq Mean Sq F value Pr(>F)
X1 1 1029 1028.85 1.1896 0.2782
X2 1 645 645.41 0.7462 0.3899
X3 3 351 116.87 0.1351 0.9389
Residuals 94 81300 864.89
想要的结果
我想将所有协变量聚合到一条线中进行平方和回归 (SSR)。
是否可以获得如下所示的方差分析表:
Response: Y
Df Sum Sq Mean Sq F value Pr(>F)
X1 + X2 + X3 5 2025 405 f p
Residuals 94 81300 864.89
其中 f 和 p 是在 R 内计算的?
沿着这些思路?
library(dplyr)
library(broom)
library(rlang) ## for convenient extraction of covariate labels
anova(model) |>
tidy() |>
group_by(is_res = term == 'Residuals') |>
summarize(Df = sum(df),
sumsq = sum(sumsq),
meansq = mean(meansq),
) |>
mutate(term = ifelse(is_res, 'Residuals', deparse(f_rhs(formula(model)))),
.before = 1) |>
select(-is_res) |>
ungroup() |>
mutate(F = meansq/lead(meansq),
p = 1 - pf(F, Df, lead(Df))
)
输出:
# A tibble: 2 × 6
term Df sumsq meansq F p
<chr> <int> <dbl> <dbl> <dbl> <dbl>
1 X1 + X2 + X3 5 2025. 597. 0.690 0.632
2 Residuals 94 81300. 865. NA NA