假设我有以下数据:
df <- structure(list(treat = structure(1:4, levels = c("Control", "Alexander Hamilton",
"Politicians pay attention", "Mark your calendar"), class = "factor"),
female_n = c(314709L, 10456L, 10481L, 10455L), female_mean = c(0.506,
0.506, 0.504, 0.5), female_sd = c(0.5, 0.5, 0.5, 0.5), birth_year_n = c(314709L,
10456L, 10481L, 10455L), birth_year_mean = c(1973.74, 1973.654,
1973.486, 1973.766), birth_year_sd = c(16.867, 16.997, 16.869,
16.89), provided_phone_no_n = c(314709L, 10456L, 10481L,
10455L), provided_phone_no_mean = c(0.656, 0.666, 0.663,
0.647), provided_phone_no_sd = c(0.475, 0.472, 0.473, 0.478
), dem_n = c(314709L, 10456L, 10481L, 10455L), dem_mean = c(0.48,
0.474, 0.482, 0.478), dem_sd = c(0.5, 0.499, 0.5, 0.5), rep_n = c(314709L,
10456L, 10481L, 10455L), rep_mean = c(0.136, 0.141, 0.142,
0.138), rep_sd = c(0.343, 0.348, 0.349, 0.345), uaf_n = c(314709L,
10456L, 10481L, 10455L), uaf_mean = c(0.363, 0.365, 0.357,
0.363), uaf_sd = c(0.481, 0.481, 0.479, 0.481)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -4L))
我想添加一个新的
*_se
列,它将数据中每个变量组的现有 *_n
和 *_sd
列作为输入。 IE。 female_*
、birth_year_*
、provided_phone_no_*
、dem_*
、rep_*
和 uaf_*
各一个。
尝试执行此操作,我认为
mutate(across())
可能是正确的辅助函数,但我在对 {.col}
进行子字符串化并让 R 将其识别为列名称时遇到一些问题。这是我迄今为止的尝试:
df %>%
mutate(
across(ends_with("_sd"),
list(
se = ~.x / sqrt(!!ensym("{str_replace(.col, '_sd', '_n')}"))
)
)
上面返回错误:
Error in `ensym()`:
! `arg` must be a symbol
Backtrace:
1. ... %>% ...
10. rlang::abort(message = message)
有人能看出我哪里出错了吗?
这确实是将数据转换为长格式然后再转换回宽格式的理想情况:
library(tidyr)
library(dplyr)
df |>
pivot_longer(cols = -treat, names_pattern = "(.*)_(.*)", names_to = c("grp", ".value")) |>
mutate(se = sd / sqrt(n)) |>
pivot_wider(names_from = grp, values_from = n:se, names_glue = "{grp}_{.value}", names_vary = "slowest")