包含交互的 GLM 中的相对重要性/变异划分

Question

我有一个关于变量的相对重要性的问题，在包含交互作用（连续 * 因子）的 GLM 中。

我正在尝试一种基于对解释的变化进行分区的方法，通过（伪）-R 平方进行近似。但我不确定如何 (1) 在 GLM 中，以及 (2) 使用包含交互的模型。

为了简单起见，我准备了一个带有单次交互的 Guassian GLM 的示例模型（使用 mtcars 数据集，请参阅文章末尾的代码）。但我实际上感兴趣的是将该方法应用于广义泊松 GLM，它可能包含多个交互。测试模型出现了一些问题：

如何正确划分 R 方？ 我尝试过划分，但不确定这是否是正确的方法。
每一项的 r 平方不等于整个模型的 r 平方（甚至不接近）。 对于不包含交互的模型，也会发生这种情况。除了 r 平方划分中的错误之外（我仍然认为自己是统计新手：P）；这也会受到共线性的影响吗？缩放连续预测变量后，方差膨胀因子低于 3（未缩放的模型的 VIF 最高 = 5.7）。

非常感谢任何帮助！


library(tidyverse)
library(rsq)
library(car)

data <- mtcars %>%
  # scale reduces collinearity: without standardizing, the variance inflation factor for the factor is 5.7
  mutate(disp = scale(disp))
data$am <- factor(data$am)

summary(data)

# test model, continuous response (miles per gallon), type of transmission (automatic/manual) as factor, displacement as continuous
model <-
  glm(mpg ~ am + disp + am:disp,
      data = data,
      family = gaussian(link = "identity"))
drop1(model, test = "F")

# graph the data
ggplot(data = data, aes(x = disp, y = mpg, col = am)) + geom_jitter() + geom_smooth(method = "glm")

# Attempted partitioning
(rsq_full <- rsq::rsq(model, adj = TRUE, type = "v"))

(rsq_int <- rsq_full - rsq::rsq(update(model, . ~ . - am:disp), adj = TRUE, type = "v"))

(rsq_factor <- rsq_full - rsq::rsq(update(model, . ~ . - am - am:disp), adj = TRUE, type = "v"))

(rsq_cont <- rsq_full - rsq::rsq(update(model, . ~ . - disp - am:disp), adj = TRUE, type = "v"))

c(rsq_full, rsq_int + rsq_factor + rsq_cont)

car::vif(model)


# A simpler model with no interaction
model2 <- glm(mpg ~ am + disp, data = data, family = gaussian(link = "identity"))
drop1(model2, test = "F")

(rsq_full2 <- rsq::rsq(model2, adj = TRUE, type = "v"))
(rsq_factor2 <- rsq_full2 - rsq::rsq(update(model2, . ~ . - am), adj = TRUE, type = "v"))
(rsq_cont2 <- rsq_full2 - rsq::rsq(update(model2, . ~ . - disp), adj = TRUE,type = "v"))

c(rsq_full2, rsq_factor2 + rsq_cont2)

car::vif(model2)

Answer 1

鉴于：

```
y = A + B + A * B
```

我会将其 R 平方值与其简单版本的 R 平方值进行比较：

```
y = A + B
```
```
y = A
```
```
y = B
```

如果没有互动，我预计

r-squared(model1) = r-squared(model2)

这应该适用于任何线性模型。即使存在交互作用，它对于比较预测变量的主要效果也应该很有用。我知道这是有争议的，但如果你看看下图所示的场景，就会发现只有考虑到预测变量 B 时，预测变量 A 才提供信息；相反，预测变量 B 甚至本身也具有一定的预测能力（B1 的 y 高于 B2 的 y，无论它们属于 A 的级别）。

这是一个模拟数据的示例（以避免共线性和非正态性问题）：

# simulate data:
df <- data.frame(Species = as.factor(c(rep("Species A", 200),
                            rep("Species B", 200)
                            )),
                Treatment = as.factor(rep(c("diet 1", "diet 2","diet 1", "diet 2"), each=100)),
                body.weight = c(rnorm(n=100, 30, 5),
                                rnorm(n=100, 29.9, 5),
                                rnorm(n=100, 55, 5),
                                rnorm(n=100, 90, 5)
                                )
                )

# Let's fit and compare the alternative models:
lm.interactive <- lm(body.weight ~ Species * Treatment, data=df)
lm.additive <- lm(body.weight ~ Species + Treatment, data=df)
lm.only.species <- lm(body.weight ~ Species, data=df)
lm.only.Treatment <- lm(body.weight ~ Treatment, data=df)
lm.null <- lm(body.weight ~ 1, data=df)

# obtain R^2:


summary(lm.only.Treatment)$adj.r.squared # main effect of Treatment
summary(lm.only.species)$adj.r.squared # main effect of species ID. 
# As the figure suggests, it's larger than the main effect of Treatment 
# (species identity affects body weight regardless of treatment)
summary(lm.additive)$adj.r.squared # sum of the main effects
summary(lm.interactive)$adj.r.squared # main effects + interaction

# fraction of variance explained by the interaction alone:
summary(lm.interactive)$adj.r.squared - summary(lm.additive)$adj.r.squared

我不确定我们是否真的可以谈论“单独由交互作用解释的方差分数”。由于包含交互项，谈论解释方差的增加可能更合适。我不确定我建议的方法在统计上有多合理、其局限性，或者它是否适用于不平衡的数据集。这种方法的一个问题是，鉴于每个模型只有一个 R 平方值，因此无法对 R 平方的差异进行统计检验。解决这个问题的一种方法是使用 bootstrapping 获得每个模型的 R 平方值的分布。

Answer 2

处理交互相对重要性的一些策略见：LeBreton, J. M., Tonidandel, S., & Krasikova, D. V. (2013)。残差相对重要性分析：高阶回归模型中方差综合分解的技术。组织研究方法，16(3), 449-473.

他们讨论了添加交互作用时计算 R2 增加的策略。此外，相互作用对“单独”R2 的贡献可以通过残差来完成：采用回归的残差，将变量之间的乘积建模为变量加性效应的函数：

https://github.com /jluchman/domir/讨论/5

由此产生的残差交互作用可用于优势分析，该分析采用模型，并将 R 平方（或其他模型质量指标，如 AUC）拆分为每个自变量的独立贡献。它可以通过 domir 包在 R 中实现 (

https://github.com/jluchman/domir

)。

Answer 3

https://academic.oup.com/jpe/article/15/6/1302/6845730

https://academic.oup.com/jpe/article/ 16/6/rtad038/7444979 安装.packages（“glmm.hp”）

图书馆（glmm.hp）

图书馆（MuMIn）

图书馆（lme4）

mod1

r.squaredGLMM(mod1)<- lmer(Sepal.Length ~ Petal.Length + Petal.Width+ Petal.Length:Petal.Width+(1|Species),data = iris)

glmm.hp(mod1)

一个

情节（一）<- glmm.hp(mod1)

包含交互的 GLM 中的相对重要性/变异划分

问题描述投票：0回答：3

3个回答

最新问题

包含交互的 GLM 中的相对重要性/变异划分

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3