我有以下数据。
stringstosearch <- c("to", "and", "at", "from", "is", "of")
set.seed(199)
id <- c(rnorm(5))
x <- c("Contrary to popular belief, Lorem Ipsum is not simply random text.",
"A Latin professor at Hampden-Sydney College in Virginia",
"It has roots in a piece of classical Latin ",
"literature from 45 BC, making it over 2000 years old.",
"The standard chunk of Lorem Ipsum used since")
datatxt <- data.frame(id, x)
datatxt$result <- str_detect(datatxt$x, paste0(stringstosearch, collapse = '|'))
我想搜索
stringtosearch
中列出的关键字,并为每个关键字创建包含结果的列。
我能做到,
library(stringr)
datatxt$result <- str_detect(datatxt$x, paste0(stringstosearch, collapse = '|'))
datatxt$result
> datatxt$result
[1] TRUE TRUE TRUE TRUE TRUE
但是我想为
stringstosearch
中的每个字符串创建结果。知道该怎么做吗?
结果应如下所示或类似:
id x to and at from is of
1 -1.9091427 Contrary to popular belief, Lorem Ipsum is not simply random text. TRUE FALSE FALSE FALSE TRUE TRUE
2 0.5551667 A Latin professor at Hampden-Sydney College in Virginia FALSE FALSE TRUE FALSE FALSE FALSE
3 -2.2163365 It has roots in a piece of classical Latin FALSE FALSE FALSE FALSE FALSE FALSE
4 0.4941455 literature from 45 BC, making it over 2000 years old. FALSE FALSE FALSE TRUE FALSE FALSE
5 -0.5805710 The standard chunk of Lorem Ipsum used since FALSE FALSE FALSE FALSE FALSE FALSE
知道如何实现这一目标吗?
这是一个基本的 R 方法。我们使用
sprintf()
将 \\b
单词边界锚点 添加到每个模式。例如,这意味着 "and"
不会匹配 "random"
。
datatxt[stringstosearch] <- lapply(
sprintf("\\b%s\\b", stringstosearch), \(x) grepl(x, datatxt$x)
)
输出:
# id x to and at from is of
# 1 -1.9091427 Contrary to popular belief, Lorem Ipsum is not simply random text. TRUE FALSE FALSE FALSE TRUE FALSE
# 2 0.5551667 A Latin professor at Hampden-Sydney College in Virginia FALSE FALSE TRUE FALSE FALSE FALSE
# 3 -2.2163365 It has roots in a piece of classical Latin FALSE FALSE FALSE FALSE FALSE TRUE
# 4 0.4941455 literature from 45 BC, making it over 2000 years old. FALSE FALSE FALSE TRUE FALSE FALSE
# 5 -0.5805710 The standard chunk of Lorem Ipsum used since FALSE FALSE FALSE FALSE FALSE TRUE