如何计算数据框中列组合的多个线性模型？

Question

好吧，这个问题可能有点复杂，所以让我尽力解释我的场景。

我有一个数据框，类似于下面设置的示例，其中有一个预测列 (

xpred

) 和三个响应列 (

y1:y3

)。所有这些都有一个关联的对数转换列，因为我希望使用对数转换数据来运行线性模型。

lm()

创建于 2024 年 10 月 25 日，使用

reprex v2.1.1现在，我可以轻松地单独运行模型（下面的示例），但是当我想在

suppressWarnings(library(tidyverse))

# Create example data. 
set.seed(10)
df <- data.frame(
  subject = rep(paste("Subject", LETTERS[1:5]), each = 10),
  xpred = rep(1:10, 5),
  y1 = sort(runif(10, min = 130, max = 220), decreasing = TRUE),
  y2 = sort(runif(10, min = 10, max = 90), decreasing = TRUE),
  y3 = sort(runif(10, min = 2, max = 5), decreasing = TRUE)
  )

# Log transfrom pred_x:y3 columns.
df <- df %>%
  group_by(subject) %>%
  mutate(across(.cols = xpred:y3, .fns = ~log(.x), .names = "{col}_log")) %>%
  ungroup()

head(df)
#> # A tibble: 6 × 9
#>   subject   xpred    y1    y2    y3 xpred_log y1_log y2_log y3_log
#>   <chr>     <int> <dbl> <dbl> <dbl>     <dbl>  <dbl>  <dbl>  <dbl>
#> 1 Subject A     1  192.  76.9  4.59     0       5.26   4.34   1.52
#> 2 Subject A     2  185.  62.1  4.51     0.693   5.22   4.13   1.51
#> 3 Subject A     3  176.  57.7  4.33     1.10    5.17   4.05   1.46
#> 4 Subject A     4  169.  55.4  4.31     1.39    5.13   4.01   1.46
#> 5 Subject A     5  168.  44.3  4.12     1.61    5.13   3.79   1.42
#> 6 Subject A     6  158.  41.9  3.85     1.79    5.06   3.74   1.35

列中的每个独特主题的以下列组合上运行模型时，这会变得有点重复：

 subject

、

y1_log ~ xpred_log

和

y2_log ~ xpred_log

。

如果我单独做这件事，这就是我的代码对于一个主题的样子。

y3_log ~ xpred_log

正如您在我的数据中注意到的那样，

# Subset data for one subject.
df_subset <- df %>% filter(subject == "Subject A")

# Run model.
model <- lm(df_subset$y1_log ~ df_subset$xpred_log)

# Extract coefficients for prediction calculation.
c <- unname(exp(model$coefficients["(Intercept)"]))
n <- unname(model$coefficients["df_subset$xpred_log"])

的范围从 1 到 10，因为

xpred

之前已针对这些值计算过。我想要为这个特定项目做的是完成

y1:y3

，使其范围从 1 到 30，并根据公式

xpred

预测

y1:y3

的值 11 到 30，其中

c*x^n

是

。例如，如果我想预测

xpred

的值为 11，则它将是

y1

。

希望你还在我身边！

所以，本质上，我想要一种干净/有效的方法：

c*11^n
。我假设这将依赖于某种形式的
```
lm()
```
调用
```
group_by()
```
。
dplyr
和
```
c
```
）根据
```
n
```
值 11 到 30（已经知道 1 到 10）来预测
```
y1:y3
```
。

xpred

值 1 到 30 的最终输出存储在整洁的数据框中以供以后分析。

感谢对此提供的任何帮助，如果有任何需要澄清的地方，请告诉我，因为这是我的第一篇文章。

Answer 1

dplyr

的按行操作小插图的建模示例的修改。 xpred

结果：

library(tidyverse) # pivot longer and then log transform; # create nested tibble with data for every subject-response combination (grouped rowwise); # fit models and extract coefficients from all combinations df_coef <- df |> pivot_longer(y1:y3, names_to = "response", values_to = "response_value") |> mutate(across(.cols = c(xpred, response_value), .fns = ~log(.x), .names = "{col}_log")) |> nest_by(subject, response) |> mutate( lm(response_value_log ~ xpred_log, data = data) |> coef() |> bind_rows() |> rename(`c` = "(Intercept)", n = "xpred_log"), c = exp(c) ) # long nested tibble: df_coef #> # A tibble: 15 × 5 #> # Rowwise: subject, response #> subject response data c n #> <chr> <chr> <list<tibble[,4]>> <dbl> <dbl> #> 1 Subject A y1 [10 × 4] 200. -0.132 #> 2 Subject A y2 [10 × 4] 101. -0.621 #> 3 Subject A y3 [10 × 4] 5.23 -0.226 #> 4 Subject B y1 [10 × 4] 200. -0.132 #> 5 Subject B y2 [10 × 4] 101. -0.621 #> 6 Subject B y3 [10 × 4] 5.23 -0.226 #> 7 Subject C y1 [10 × 4] 200. -0.132 #> 8 Subject C y2 [10 × 4] 101. -0.621 #> 9 Subject C y3 [10 × 4] 5.23 -0.226 #> 10 Subject D y1 [10 × 4] 200. -0.132 #> 11 Subject D y2 [10 × 4] 101. -0.621 #> 12 Subject D y3 [10 × 4] 5.23 -0.226 #> 13 Subject E y1 [10 × 4] 200. -0.132 #> 14 Subject E y2 [10 × 4] 101. -0.621 #> 15 Subject E y3 [10 × 4] 5.23 -0.226 # first data tibble (SubjectA, response y1) df_coef$data[[1]] #> # A tibble: 10 × 4 #> xpred response_value xpred_log response_value_log #> <int> <dbl> <dbl> <dbl> #> 1 1 192. 0 5.26 #> 2 2 185. 0.693 5.22 #> 3 3 176. 1.10 5.17 #> 4 4 169. 1.39 5.13 #> 5 5 168. 1.61 5.13 #> 6 6 158. 1.79 5.06 #> 7 7 155. 1.95 5.04 #> 8 8 155. 2.08 5.04 #> 9 9 150. 2.20 5.01 #> 10 10 138. 2.30 4.92 # replace nested data tibbles with calculated values for xpred = 11:30; # revrese nesting & pivoting df_coef |> mutate(data = tibble(xpred = 11:30, response_value = c*xpred^n) |> list(), .keep = "unused") |> unnest(data) |> pivot_wider(names_from = response, values_from = response_value)

示例数据：

#> # A tibble: 100 × 5 #> # Groups: subject [5] #> subject xpred y1 y2 y3 #> <chr> <int> <dbl> <dbl> <dbl> #> 1 Subject A 11 146. 22.8 3.04 #> 2 Subject A 12 144. 21.6 2.98 #> 3 Subject A 13 142. 20.5 2.93 #> 4 Subject A 14 141. 19.6 2.88 #> 5 Subject A 15 140. 18.8 2.84 #> 6 Subject A 16 139. 18.0 2.80 #> 7 Subject A 17 137. 17.4 2.76 #> 8 Subject A 18 136. 16.8 2.72 #> 9 Subject A 19 135. 16.2 2.69 #> 10 Subject A 20 135. 15.7 2.66 #> # ℹ 90 more rows

Answer 2

# Create example data. 
set.seed(10)
df <- data.frame(
  subject = rep(paste("Subject", LETTERS[1:5]), each = 10),
  xpred = rep(1:10, 5),
  y1 = sort(runif(10, min = 130, max = 220), decreasing = TRUE),
  y2 = sort(runif(10, min = 10, max = 90), decreasing = TRUE),
  y3 = sort(runif(10, min = 2, max = 5), decreasing = TRUE)
)

主题，这要快得多。

by

步骤可以集成到公式中，因此您可以立即开始使用原始

log

，无需麻烦。

df

注意，

我稍微更改了您的示例数据，以便主题具有不同的值。

数据：

> by(df, df$subject, \(X) { + res <- coef(lm(log(cbind(y1, y2, y3)) ~ log(xpred), X)) + data.frame(subject=el(X$subject), response=colnames(res), + c=exp(res[1, ]), n=res[2, ]) + }) |> c(make.row.names=FALSE) |> do.call(what='rbind') subject response c n 1 Subject A y1 216.527177 -0.03708192 2 Subject A y2 91.174983 -0.13562541 3 Subject A y3 4.961212 -0.03240209 4 Subject B y1 198.757004 -0.03864692 5 Subject B y2 60.590815 -0.07311595 6 Subject B y3 4.623532 -0.05780232 7 Subject C y1 177.663964 -0.03218263 8 Subject C y2 50.915641 -0.07458970 9 Subject C y3 4.081367 -0.09050648 10 Subject D y1 163.418428 -0.03346902 11 Subject D y2 40.833918 -0.19185546 12 Subject D y3 3.199167 -0.06265984 13 Subject E y1 151.794013 -0.05738562 14 Subject E y2 29.680704 -0.38040373 15 Subject E y3 2.757717 -0.10324458

如何计算数据框中列组合的多个线性模型？

问题描述投票：0回答：2

2个回答

最新问题

如何计算数据框中列组合的多个线性模型？

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2