好吧,这个问题可能有点复杂,所以让我尽力解释我的场景。
我有一个数据框,类似于下面设置的示例,其中有一个预测列 (
xpred
) 和三个响应列 (y1:y3
)。所有这些都有一个关联的对数转换列,因为我希望使用对数转换数据来运行线性模型。lm()
创建于 2024 年 10 月 25 日,使用 reprex v2.1.1 现在,我可以轻松地单独运行模型(下面的示例),但是当我想在
suppressWarnings(library(tidyverse))
# Create example data.
set.seed(10)
df <- data.frame(
subject = rep(paste("Subject", LETTERS[1:5]), each = 10),
xpred = rep(1:10, 5),
y1 = sort(runif(10, min = 130, max = 220), decreasing = TRUE),
y2 = sort(runif(10, min = 10, max = 90), decreasing = TRUE),
y3 = sort(runif(10, min = 2, max = 5), decreasing = TRUE)
)
# Log transfrom pred_x:y3 columns.
df <- df %>%
group_by(subject) %>%
mutate(across(.cols = xpred:y3, .fns = ~log(.x), .names = "{col}_log")) %>%
ungroup()
head(df)
#> # A tibble: 6 × 9
#> subject xpred y1 y2 y3 xpred_log y1_log y2_log y3_log
#> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Subject A 1 192. 76.9 4.59 0 5.26 4.34 1.52
#> 2 Subject A 2 185. 62.1 4.51 0.693 5.22 4.13 1.51
#> 3 Subject A 3 176. 57.7 4.33 1.10 5.17 4.05 1.46
#> 4 Subject A 4 169. 55.4 4.31 1.39 5.13 4.01 1.46
#> 5 Subject A 5 168. 44.3 4.12 1.61 5.13 3.79 1.42
#> 6 Subject A 6 158. 41.9 3.85 1.79 5.06 3.74 1.35
列中的每个独特主题的以下列组合上运行模型时,这会变得有点重复:
subject
、y1_log ~ xpred_log
和 y2_log ~ xpred_log
。如果我单独做这件事,这就是我的代码对于一个主题的样子。
y3_log ~ xpred_log
正如您在我的数据中注意到的那样,
# Subset data for one subject.
df_subset <- df %>% filter(subject == "Subject A")
# Run model.
model <- lm(df_subset$y1_log ~ df_subset$xpred_log)
# Extract coefficients for prediction calculation.
c <- unname(exp(model$coefficients["(Intercept)"]))
n <- unname(model$coefficients["df_subset$xpred_log"])
的范围从 1 到 10,因为
xpred
之前已针对这些值计算过。我想要为这个特定项目做的是完成 y1:y3
,使其范围从 1 到 30,并根据公式 xpred
预测 y1:y3
的值 11 到 30,其中 c*x^n
是 x
。例如,如果我想预测 xpred
的值为 11,则它将是 y1
。希望你还在我身边!
所以,本质上,我想要一种干净/有效的方法:
c*11^n
。我假设这将依赖于某种形式的
lm()
调用 group_by()
。
dplyr
和
c
)根据 n
值 11 到 30(已经知道 1 到 10)来预测 y1:y3
。
xpred
值 1 到 30 的最终输出存储在整洁的数据框中以供以后分析。
感谢对此提供的任何帮助,如果有任何需要澄清的地方,请告诉我,因为这是我的第一篇文章。
结果:
library(tidyverse)
# pivot longer and then log transform;
# create nested tibble with data for every subject-response combination (grouped rowwise);
# fit models and extract coefficients from all combinations
df_coef <-
df |>
pivot_longer(y1:y3, names_to = "response", values_to = "response_value") |>
mutate(across(.cols = c(xpred, response_value), .fns = ~log(.x), .names = "{col}_log")) |>
nest_by(subject, response) |>
mutate(
lm(response_value_log ~ xpred_log, data = data) |>
coef() |>
bind_rows() |>
rename(`c` = "(Intercept)", n = "xpred_log"),
c = exp(c)
)
# long nested tibble:
df_coef
#> # A tibble: 15 × 5
#> # Rowwise: subject, response
#> subject response data c n
#> <chr> <chr> <list<tibble[,4]>> <dbl> <dbl>
#> 1 Subject A y1 [10 × 4] 200. -0.132
#> 2 Subject A y2 [10 × 4] 101. -0.621
#> 3 Subject A y3 [10 × 4] 5.23 -0.226
#> 4 Subject B y1 [10 × 4] 200. -0.132
#> 5 Subject B y2 [10 × 4] 101. -0.621
#> 6 Subject B y3 [10 × 4] 5.23 -0.226
#> 7 Subject C y1 [10 × 4] 200. -0.132
#> 8 Subject C y2 [10 × 4] 101. -0.621
#> 9 Subject C y3 [10 × 4] 5.23 -0.226
#> 10 Subject D y1 [10 × 4] 200. -0.132
#> 11 Subject D y2 [10 × 4] 101. -0.621
#> 12 Subject D y3 [10 × 4] 5.23 -0.226
#> 13 Subject E y1 [10 × 4] 200. -0.132
#> 14 Subject E y2 [10 × 4] 101. -0.621
#> 15 Subject E y3 [10 × 4] 5.23 -0.226
# first data tibble (SubjectA, response y1)
df_coef$data[[1]]
#> # A tibble: 10 × 4
#> xpred response_value xpred_log response_value_log
#> <int> <dbl> <dbl> <dbl>
#> 1 1 192. 0 5.26
#> 2 2 185. 0.693 5.22
#> 3 3 176. 1.10 5.17
#> 4 4 169. 1.39 5.13
#> 5 5 168. 1.61 5.13
#> 6 6 158. 1.79 5.06
#> 7 7 155. 1.95 5.04
#> 8 8 155. 2.08 5.04
#> 9 9 150. 2.20 5.01
#> 10 10 138. 2.30 4.92
# replace nested data tibbles with calculated values for xpred = 11:30;
# revrese nesting & pivoting
df_coef |>
mutate(data = tibble(xpred = 11:30, response_value = c*xpred^n) |> list(), .keep = "unused") |>
unnest(data) |>
pivot_wider(names_from = response, values_from = response_value)
#> # A tibble: 100 × 5
#> # Groups: subject [5]
#> subject xpred y1 y2 y3
#> <chr> <int> <dbl> <dbl> <dbl>
#> 1 Subject A 11 146. 22.8 3.04
#> 2 Subject A 12 144. 21.6 2.98
#> 3 Subject A 13 142. 20.5 2.93
#> 4 Subject A 14 141. 19.6 2.88
#> 5 Subject A 15 140. 18.8 2.84
#> 6 Subject A 16 139. 18.0 2.80
#> 7 Subject A 17 137. 17.4 2.76
#> 8 Subject A 18 136. 16.8 2.72
#> 9 Subject A 19 135. 16.2 2.69
#> 10 Subject A 20 135. 15.7 2.66
#> # ℹ 90 more rows
# Create example data.
set.seed(10)
df <- data.frame(
subject = rep(paste("Subject", LETTERS[1:5]), each = 10),
xpred = rep(1:10, 5),
y1 = sort(runif(10, min = 130, max = 220), decreasing = TRUE),
y2 = sort(runif(10, min = 10, max = 90), decreasing = TRUE),
y3 = sort(runif(10, min = 2, max = 5), decreasing = TRUE)
)
主题,这要快得多。
by
步骤可以集成到公式中,因此您可以立即开始使用原始 log
,无需麻烦。df
注意,我稍微更改了您的示例数据,以便主题具有不同的值。
> by(df, df$subject, \(X) {
+ res <- coef(lm(log(cbind(y1, y2, y3)) ~ log(xpred), X))
+ data.frame(subject=el(X$subject), response=colnames(res),
+ c=exp(res[1, ]), n=res[2, ])
+ }) |> c(make.row.names=FALSE) |> do.call(what='rbind')
subject response c n
1 Subject A y1 216.527177 -0.03708192
2 Subject A y2 91.174983 -0.13562541
3 Subject A y3 4.961212 -0.03240209
4 Subject B y1 198.757004 -0.03864692
5 Subject B y2 60.590815 -0.07311595
6 Subject B y3 4.623532 -0.05780232
7 Subject C y1 177.663964 -0.03218263
8 Subject C y2 50.915641 -0.07458970
9 Subject C y3 4.081367 -0.09050648
10 Subject D y1 163.418428 -0.03346902
11 Subject D y2 40.833918 -0.19185546
12 Subject D y3 3.199167 -0.06265984
13 Subject E y1 151.794013 -0.05738562
14 Subject E y2 29.680704 -0.38040373
15 Subject E y3 2.757717 -0.10324458