我想在删除NAs的列之间实现串联并观察牛津逗号。
x <- data.frame(ID = 1:3,
col1 = c("snap", "snap", NA),
col2 = c(NA, "crackle", "crackle"),
col3 = c(NA, NA, "pop"),
col4 = c(NA, "yummy", NA))
使用上面的数据帧,我想连接col1:col4并将结果返回给x $ treats
x$treats[1]
"snap"
x$treats[2]
"snap, crackle, and yummy"
x$treats[3]
"crackle and pop"
数据集还有一个不应包含在串联中的ID变量(因此不允许我指定所需列的解决方案不完整)。
这是另一种选择,虽然更加冗长。通过将列表生成包装到函数中,我们还可以添加一个选项来禁用牛津逗号,如果需要:
x <- data.frame(
ID = 1:3,
col1 = c("snap", "snap", NA),
col2 = c(NA, "crackle", "crackle"),
col3 = c(NA, NA, "pop"),
col4 = c(NA, "yummy", NA)
)
language_list <- function(x, oxford_comma = TRUE) {
x <- x[!is.na(x)]
if (length(x) < 2) {
return(x)
}
last <- tail(x, 1)
rest <- head(x, -1)
if (length(rest) == 1) {
return(paste(rest, "and", last))
}
rest <- paste(rest, collapse = ", ")
paste0(rest, if (oxford_comma) ",", " and ", last)
}
cols <- paste0("col", 1:4)
x$treats <- apply(x[, cols], 1, language_list)
x$treats
#> [1] "snap" "snap, crackle, and yummy"
#> [3] "crackle and pop"
> x <- data.frame(ID = 1:3,
col1 = c("snap", "snap", NA),
col2 = c(NA, "crackle", "crackle"),
col3 = c(NA, NA, "pop"),
col4 = c(NA, "yummy", NA),stringsAsFactors = F)
> a=gsub("(\\w)\\s+","\\1, ",trimws(do.call(paste,replace(x[-1],is.na(x[-1]),""))))
(x1=transform(x,treat=gsub(",\\s(\\w+)$",", and \\1",a),stringsAsFactors=F))
ID col1 col2 col3 col4 treat
1 1 snap <NA> <NA> <NA> snap
2 2 snap crackle <NA> yummy snap, crackle, and yummy
3 3 <NA> crackle pop <NA> crackle, and pop
> x1$treat[1]
[1] "snap"
> x1$treat[2]
[1] "snap, crackle, and yummy"
> x1$treat[3]
[1] "crackle, and pop"
你也可以使用collapse
包中的glue
:
x$trat=apply(x[-1],1,function(y)glue::collapse(y[!is.na(y)],", ",last = ", and "))
> x$treat[1]
[1] "snap"
> x$treat[2]
[1] "snap, crackle, and yummy"
> x$treat[3]
[1] "crackle, and pop"