在一列中搜索多个关键字并为每个关键字创建列

问题描述 投票:0回答:1

我有以下数据。

stringstosearch <- c("to", "and", "at", "from", "is", "of")

set.seed(199)
id <- c(rnorm(5))
x  <- c("Contrary to popular belief, Lorem Ipsum is not simply random text.",
       "A Latin professor at Hampden-Sydney College in Virginia",
       "It has roots in a piece of classical Latin ", 
       "literature from 45 BC, making it over 2000 years old.", 
       "The standard chunk of Lorem Ipsum used since")
datatxt <- data.frame(id, x)

datatxt$result <- str_detect(datatxt$x, paste0(stringstosearch, collapse = '|'))

我想搜索

stringtosearch
中列出的关键字,并为每个关键字创建包含结果的列。

我能做到,

library(stringr)

datatxt$result <- str_detect(datatxt$x, paste0(stringstosearch, collapse = '|'))

datatxt$result

> datatxt$result
[1] TRUE TRUE TRUE TRUE TRUE

但是我想为

stringstosearch
中的每个字符串创建结果。知道该怎么做吗?

结果应如下所示或类似:

          id                                                                  x    to   and    at  from    is    of
1 -1.9091427 Contrary to popular belief, Lorem Ipsum is not simply random text.  TRUE FALSE FALSE FALSE  TRUE  TRUE
2  0.5551667            A Latin professor at Hampden-Sydney College in Virginia FALSE FALSE  TRUE FALSE FALSE FALSE
3 -2.2163365                        It has roots in a piece of classical Latin  FALSE FALSE FALSE FALSE FALSE FALSE
4  0.4941455              literature from 45 BC, making it over 2000 years old. FALSE FALSE FALSE  TRUE FALSE FALSE
5 -0.5805710                       The standard chunk of Lorem Ipsum used since FALSE FALSE FALSE FALSE FALSE FALSE

知道如何实现这一目标吗?

r tidyverse stringr stringi
1个回答
0
投票

这是一个基本的 R 方法。我们使用

sprintf()
\\b
单词边界锚点 添加到每个模式。例如,这意味着
"and"
不会匹配
"random"

datatxt[stringstosearch] <- lapply(
    sprintf("\\b%s\\b", stringstosearch), \(x) grepl(x, datatxt$x)
)

输出:

#           id                                                                  x    to   and    at  from    is    of
# 1 -1.9091427 Contrary to popular belief, Lorem Ipsum is not simply random text.  TRUE FALSE FALSE FALSE  TRUE FALSE
# 2  0.5551667            A Latin professor at Hampden-Sydney College in Virginia FALSE FALSE  TRUE FALSE FALSE FALSE
# 3 -2.2163365                        It has roots in a piece of classical Latin  FALSE FALSE FALSE FALSE FALSE  TRUE
# 4  0.4941455              literature from 45 BC, making it over 2000 years old. FALSE FALSE FALSE  TRUE FALSE FALSE
# 5 -0.5805710                       The standard chunk of Lorem Ipsum used since FALSE FALSE FALSE FALSE FALSE  TRUE
© www.soinside.com 2019 - 2024. All rights reserved.