R 中固定效应的向后逐步回归

Question

我正在尝试在 R 中运行向后逐步回归，但我可能不知道如何实现我想要的。

这是我想做的一个例子：使用 Iris 数据集，我对物种进行了具有固定效应的简单回归：

lm(Petal.Length ~ Petal.Width + Sepal.Width + Sepal.Length + as.factor(Species), data = iris)

我想运行类似于向后逐步回归的方法，但我不想删除变量，而是想从物种中删除类别（子采样），直到我的变量不再显着。

我怎样才能做到这一点？

Answer 1

我通过几个循环实现了我想要的结果，但我希望有一个更有效或更简单的解决方案：

# Install and load necessary packages
pacman::p_load(dplyr, broom, writexl, beepr)

# Assuming iris is the dataframe 
# Model: lm(Petal.Length ~ Petal.Width + Sepal.Width + Sepal.Length + as.factor(Species), data = iris)

# Define a function to run the regression model and return the results
run_model <- function(data, formula) {
  model <- lm(formula, data = data)
  # Extract model summary
  summary_model <- summary(model)
  # Extract coefficients, p-values, and other details
  coefficients <- summary_model$coefficients
  # Return a list with AIC and coefficients
  return(list(coefficients = coefficients))
}

# Store initial model results for the full model
full_model_results <- run_model(iris, 
                                Petal.Length ~ Petal.Width + Sepal.Width + Sepal.Length + as.factor(Species))

# Initialize a list to store results
results <- list(full_model = full_model_results)

# Get the unique levels of Species
species <- unique(iris$Species)

# Loop to drop one cat at a time
for (i in 0:(length(species)-1)) {
  # Subset data to exclude the current set of species
  subset_data <- dplyr::filter(iris, !Species %in% species[1:i])
  
  # Check if there are at least two levels of Species left
  if (length(unique(subset_data$Species)) < 2) {
    next
  }
  
  # Run the regression model on the subset data
  current_results <- run_model(subset_data, 
                               Petal.Length ~ Petal.Width + Sepal.Width + Sepal.Length + as.factor(Species))
  
  # Store the results with a descriptive name
  results[[paste0("drop_", i, "_species")]] <- current_results
}

# Format and display the results in a more readable format
formatted_results <- lapply(results, function(res) {
  list(
    Coefficients = as.data.frame(res$coefficients)
  )
})

Answer 2

请不要这样做。寻找“最佳”模型的逐步过程几乎普遍都是糟糕的。

请参阅以下内容以进一步阅读（在CrossValidated）：

自动模型选择算法

理解为什么基于 p 值的逐步选择不好

为什么执行逐步选择后 p 值会产生误导？

我们对“野外”p-hacking 了解多少？

ASA 讨论 p 值的局限性 - 有哪些替代方案？

在外部网站/博客上：

还有一些期刊文章：

亨德森，D.A.和 Denison, D.R.，1989。社会和心理研究中的逐步回归。心理报告，64(1)，第 251-257 页。
https://journals.sagepub.com/doi/abs/10.2466/pr0.1989.64.1.251

Hurvich，C.M. & 蔡 C.L. (1990) 模型选择对线性回归推理的影响。美国统计学家，44，214–217。
https://www.tandfonline.com/doi/abs/10.1080/00031305.1990.10475722

Stephens, P.A.、Buskirk, S.W.、Hayward, G.D. 和 Martinez del Rio, C. (2005) 信息论和假设检验：呼吁多元化。应用生态学杂志，42，4–12。
https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/j.1365-2664.2005.01002.x

Steyerberg，E.W.，Eijkemans，M.J.C. ＆哈贝玛，J.D.F. (1999) 小数据集中的逐步选择：逻辑回归分析中偏差的模拟研究。临床流行病学杂志，52, 935–942。
https://www.sciencedirect.com/science/article/abs/pii/S0895435699001031

Thompson, B., 2001。重要性、效应大小、逐步方法和其他问题：强有力的论据推动了这一领域的发展。实验教育杂志，70(1)，第 80-93 页。 https://www.tandfonline.com/doi/abs/10.1080/00220970109599499

Whittingham, M.J.、Stephens, P.A.、Bradbury, R.B. 和 Freckleton, R.P.，2006。为什么我们在生态学和行为中仍然使用逐步建模？。动物生态学杂志，75(5)，第 1182-1189 页。 https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/j.1365-2656.2006.01141.x

Woodside, A.G.，2016 年。良好实践宣言：克服当前商业研究中普遍存在的不良实践。商业研究杂志，69(2)，第 365-381 页。 https://www.sciencedirect.com/science/article/abs/pii/S0148296315004142

R 中固定效应的向后逐步回归

问题描述投票：0回答：2

2个回答

最新问题

R 中固定效应的向后逐步回归

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2