我们如何在不指定列名称的情况下对所有列使用 dplyr (tidyverse) 获得第一个非缺失值 - coalesce - 逐行?
示例数据:
df <- data.frame(x = c(NA, "s3", NA, NA,"s4"),
y = c("s1", NA, "s6", "s7", "s4"),
z = c("s1", NA, NA, "s7", NA))
我们可以使用 do.call,但这看起来不太整洁:
df$xyz <- do.call(coalesce, df)
# x y z xyz
# 1 <NA> s1 s1 s1
# 2 s3 <NA> <NA> s3
# 3 <NA> s6 <NA> s6
# 4 <NA> s7 s7 s7
# 5 s4 s4 <NA> s4
这可行,但我不想指定列:
df %>%
mutate(xyz = coalesce(x, y, z))
# x y z xyz
# 1 <NA> s1 s1 s1
# 2 s3 <NA> <NA> s3
# 3 <NA> s6 <NA> s6
# 4 <NA> s7 s7 s7
# 5 s4 s4 <NA> s4
类似于data.table:
library(data.table)
setDT(df)[, xyz := fcoalesce(.SD) ][]
# x y z xyz
# 1: <NA> s1 s1 s1
# 2: s3 <NA> <NA> s3
# 3: <NA> s6 <NA> s6
# 4: <NA> s7 s7 s7
# 5: s4 s4 <NA> s4
失败的尝试:
df %>%
mutate(xyz = coalesce(all_vars()))
df %>%
mutate(xyz = coalesce(c_across(all_vars())))
df %>%
rowwise() %>%
mutate(xyz = coalesce(all_vars()))
df %>%
rowwise() %>%
mutate(xyz = coalesce(c_across(all_vars())))
有什么想法吗?
取自此 GitHub 讨论,您可以创建一个
coacross
函数:
coacross <- function(...) {
coalesce(!!!across(...))
}
df %>%
mutate(xyz = coacross(everything()))
x y z xyz
1 <NA> s1 s1 s1
2 s3 <NA> <NA> s3
3 <NA> s6 <NA> s6
4 <NA> s7 s7 s7
5 s4 s4 <NA> s4
我们可以使用拼接运算符
coalesce
将数据帧注入到 !!!
。
library(dplyr)
df %>% mutate(xyz = coalesce(!!!df))
或者更多“tidyverse”,例如:
df %>% mutate(xyz = coalesce(!!!select(., everything())))
x y z xyz
1 <NA> s1 s1 s1
2 s3 <NA> <NA> s3
3 <NA> s6 <NA> s6
4 <NA> s7 s7 s7
5 s4 s4 <NA> s4
这是可能的解决方案:
df %>%
mutate(xyz = do.call(coalesce,across()))
#> x y z xyz
#> 1 <NA> s1 s1 s1
#> 2 s3 <NA> <NA> s3
#> 3 <NA> s6 <NA> s6
#> 4 <NA> s7 s7 s7
#> 5 s4 s4 <NA> s4
如果考虑使用
purrr
你可以这样做:
library(dplyr)
library(purrr)
df |>
mutate(xyz = reduce(pick(everything()), coalesce))
# x y z xyz
# 1 <NA> s1 s1 s1
# 2 s3 <NA> <NA> s3
# 3 <NA> s6 <NA> s6
# 4 <NA> s7 s7 s7
# 5 s4 s4 <NA> s4
我认为这里的一个附带好处是
reduce
有 .dir
参数,这样你就可以控制方向性。使用这个你可以轻松获取 last 非缺失值:
df |>
mutate(xyz = reduce(pick(everything()), coalesce, .dir = "backward"))