作为管道的一部分,我想获取一个数据框或 tibble 并重命名由位置索引向量指定的列的子集,并将新的列名称作为其索引的函数而不是它们的名称.我不想离开管道、存储中间结果、存储索引向量,或者必须输入两次索引向量(如果我想更改它们,等待发生的事故)。
我可以通过使用
dplyr::rename_with
或 rlang::set_names
管道进入一个可怕的匿名函数来实现我的目标。但肯定有比我想出的更简洁的方法来做到这一点?
library(tidyverse)
# Base R does what I want: but not pipe-friendly
temp <- starwars |>
head(c(2, 6))
idx <- c(2, 4:6)
colnames(temp)[idx] <- str_c("col_", idx, "_new")
print(temp)
#> # A tibble: 2 × 6
#> name col_2_new mass col_4_new col_5_new col_6_new
#> <chr> <int> <dbl> <chr> <chr> <chr>
#> 1 Luke Skywalker 172 77 blond fair blue
#> 2 C-3PO 167 75 <NA> gold yellow
# Can repeat the vector of selected indices in the .fn argument of rename_with
# but surely there's a way to avoid writing c(2, 4:6) twice?
starwars |>
head(c(2, 6)) |>
rename_with(.cols = c(2, 4:6), ~ str_c("col_", c(2, 4:6), "_new"))
#> # A tibble: 2 × 6
#> name col_2_new mass col_4_new col_5_new col_6_new
#> <chr> <int> <dbl> <chr> <chr> <chr>
#> 1 Luke Skywalker 172 77 blond fair blue
#> 2 C-3PO 167 75 <NA> gold yellow
# rename_with doesn't *quite* do what I want here
# Can specify cols by index, but .x is the column name not its index
starwars |>
head(c(2, 6)) |>
rename_with(.cols = c(2, 4:6), ~ str_c("col_", .x, "_new"))
#> # A tibble: 2 × 6
#> name col_height_new mass col_hair_color_new col_skin_colo…¹ col_e…²
#> <chr> <int> <dbl> <chr> <chr> <chr>
#> 1 Luke Skywalker 172 77 blond fair blue
#> 2 C-3PO 167 75 <NA> gold yellow
#> # … with abbreviated variable names ¹col_skin_color_new, ²col_eye_color_new
# Anonymous function avoids repeating c(2, 4:6) - supplying the external vector
# means using all_of() or any_of() depending on whether you want an error if
# an index is missing.
# But surely there's an easier way than this?
starwars |>
head(c(2, 6)) |>
(\(tbl, idx) rename_with(tbl, .cols = all_of(idx),
~ str_c("col_", idx, "_new")))(c(2, 4:6))
#> # A tibble: 2 × 6
#> name col_2_new mass col_4_new col_5_new col_6_new
#> <chr> <int> <dbl> <chr> <chr> <chr>
#> 1 Luke Skywalker 172 77 blond fair blue
#> 2 C-3PO 167 75 <NA> gold yellow
# There's also rlang::set_names ... but this is even uglier
starwars |>
head(c(2, 6)) |>
(\(tbl, idx) set_names(tbl, ifelse(seq_along(tbl) %in% idx,
str_c("col_", seq_along(tbl), "_new"),
colnames(tbl))))(c(2, 4:6))
#> # A tibble: 2 × 6
#> name col_2_new mass col_4_new col_5_new col_6_new
#> <chr> <int> <dbl> <chr> <chr> <chr>
#> 1 Luke Skywalker 172 77 blond fair blue
#> 2 C-3PO 167 75 <NA> gold yellow
相关问题,但不重复,因为它们不要求新名称是索引的函数:R: dplyr - Rename column name by position instead of name 和How to dplyr rename a column, by column指数?
我认为没有规范/干净的方法可以做到这一点,除非 i)两次使用索引的值或 ii)将它们存储在临时变量中(或 iii)使用 hacky 方法将值即时存储在临时变量或函数并再次使用它们)。
我想说一个规范的方法是创建一个查找向量并在里面使用它
rename(all_of())
。稍后再看这段代码时,很容易理解列名是如何重新编码的。
library(tidyverse)
idx <- c(2, 4:6)
lookup_vec <- setNames(idx, str_c("col_", idx, "_new"))
starwars |>
head(c(2, 6)) |>
rename(all_of(lookup_vec))
#> # A tibble: 2 × 6
#> name col_2_new mass col_4_new col_5_new col_6_new
#> <chr> <int> <dbl> <chr> <chr> <chr>
#> 1 Luke Skywalker 172 77 blond fair blue
#> 2 C-3PO 167 75 <NA> gold yellow
如果你想大量应用这种操作并且想不惜一切代价避免临时变量,那么辅助函数可能会起作用:
rename_at_idx <- function(df, idx, before = "", after = "") {
rename(df, all_of(setNames(idx,
str_c(before, idx, after))
)
)
}
starwars |>
head(c(2, 6)) |>
rename_at_idx(c(2, 4:6), "col_", "_new")
#> same output
创建于 2023-03-20 与 reprex v2.0.2