如何在 R 中循环/重复线性回归

Question

我已经弄清楚如何在

中制作一个包含 4 个变量的表格，我将其用于多元线性回归。每个回归的因变量 (

Lung

) 取自 22,000 列的 csv 表的一列。其中一个自变量（

Blood

）取自类似表的相应列。

每一列代表一个特定基因的水平，这就是为什么有这么多。还有两个额外的变量（每个患者的

Age

和

Gender

）。当我输入线性回归方程时，我使用

lm(Lung[,1] ~ Blood[,1] + Age + Gender)

，它适用于一个基因。

我正在寻找一种方法来输入这个方程式，并让 R 计算

Lung

和

Blood

的所有剩余列，并希望将系数输出到表中。

任何帮助将不胜感激！

Answer 1

您想运行 22,000 个线性回归并提取系数？从编码的角度来看，这很简单。

set.seed(1)

# number of columns in the Lung and Blood data.frames. 22,000 for you?
n <- 5 

# dummy data
obs <- 50 # observations
Lung <- data.frame(matrix(rnorm(obs*n), ncol=n))
Blood <- data.frame(matrix(rnorm(obs*n), ncol=n))
Age <- sample(20:80, obs)
Gender  <- factor(rbinom(obs, 1, .5))

# run n regressions
my_lms <- lapply(1:n, function(x) lm(Lung[,x] ~ Blood[,x] + Age + Gender))

# extract just coefficients
sapply(my_lms, coef)

# if you need more info, get full summary call. now you can get whatever, like:
summaries <- lapply(my_lms, summary)
# ...coefficents with p values:
lapply(summaries, function(x) x$coefficients[, c(1,4)])
# ...or r-squared values
sapply(summaries, function(x) c(r_sq = x$r.squared, 
                                adj_r_sq = x$adj.r.squared))

模型存储在列表中，其中模型 3（带有 DV Lung[, 3] 和 IVs Blood[,3] + 年龄 + 性别）在

my_lms[[3]]

中，依此类推。您可以使用列表上的应用函数来执行汇总，您可以从中提取您想要的数字。

Answer 2

问题似乎是关于如何使用在循环内修改的公式调用回归函数。

这里是你如何做到的（使用钻石数据集）：

attach(ggplot2::diamonds)
strCols = names(ggplot2::diamonds)

formula <- list(); model <- list()
for (i in 1:1) {
  formula[[i]] = paste0(strCols[7], " ~ ", strCols[7+i])
  model[[i]] = glm(formula[[i]]) 

  #then you can plot or do anything else with the result ...
  png(filename = sprintf("diamonds_price=glm(%s).png", strCols[7+i]))
  par(mfrow = c(2, 2))      
  plot(model[[i]])
  dev.off()
  }

Answer 3

明智与否，要使循环至少以某种方式工作，您需要：

y<- c(1,5,6,2,5,10) # response 
x1<- c(2,12,8,1,16,17) # predictor 
x2<- c(2,14,5,1,17,17) 
predictorlist<- list("x1","x2") 
for (i in predictorlist){ 
  model <- lm(paste("y ~", i[[1]]), data=df) 
  print(summary(model)) 
}

粘贴功能将解决问题。

Answer 4

一个 tidyverse 添加 - 带有 map()

另一种方式 - 使用

map2()

包中的

purrr

：

library(purrr)

xs <- anscombe[,1:3] # Select variables of interest
ys <- anscombe[,5:7]

map2_df(ys, xs,
        function(i,j){
          m <- lm(i ~j + x4 , data = anscombe)
          coef(m)
        })

输出是所有系数的数据帧（tibble）：

  `(Intercept)`     j      x4
1          4.33 0.451 -0.0987
2          6.42 0.373 -0.253 
3          2.30 0.526  0.0518

如果有更多变量发生变化，这可以使用

pmap()

函数来完成

Answer 5

以下方法适用于多变量模型，其中您有多个结果和预测变量。我将使用一些示例数据来说明这个想法及其工作原理。

df <- data.frame(y1=sample(1:5, size=50, replace=TRUE),
                 y2=sample(1:5, size=50, replace=TRUE),
                 x1=sample(1:5, size=50, replace=TRUE),
                 x2=sample(1:5, size=50, replace=TRUE),
                 x3=sample(1:5, size=50, replace=TRUE),
                 x4=sample(1L:2L, size=50, replace=TRUE))
df

该函数需要

dv

的命名参数，但使用省略号表示您可以有任意数量的预测变量。在函数内部，您在预测变量上使用

deparse()

和

substitute()

并将它们与

reformulate()

一起传递给

dv

函数。我在函数中包含了

dv=dv

，这样就可以看到模型输出与哪个dv相关联。

# custom lm function
lm_func <- function(dv, ...){
  x = sapply(substitute(...()), deparse)
  f = reformulate(termlabels=x, response=dv)
  model = eval(lm(f, data=df))
  list(dv=dv, model_summary=summary(model))
}

下一步，从目标数据框中选择

dvs

，并命名它们。

# select the dvs and set names
dvs <- names(df)[1:2]
dvs <- purrr::set_names(dvs)

最后，在 dvs 上运行一个循环并存储结果。

# run a for loop and save the output for each loop
lm_out = list()
for (i in 1:length(dvs)){
  lm_out[[i]] = (lm_func(dvs[i], x1, x2))
  }
lm_out

注意：可以在

lm_func

中做更多的事情；例如，在提取模型摘要的哪些部分方面。

如何在 R 中循环/重复线性回归

问题描述投票：0回答：5

5个回答

最新问题

如何在 R 中循环/重复线性回归

问题描述 投票：0回答：5

5个回答

最新问题

问题描述投票：0回答：5