通过 R 中的字典进行分组

问题描述 投票:0回答:1

我的任务是根据特定单词的使用来识别句子所属的组,例如识别使用哪种颜色来描述动物。我有一本字典,里面有我想用这种方式识别的单词:

df <- data.frame(id = c(1:5), pets = c("brown dog", "black cat", "orange cat", "black bird", "white hamster"))

dictionary <- c("black", "orange", "white", "brown", "green", "red")

我需要将宠物与表明它们所属类别的字典进行匹配,我的最终 df 如下:

final_df <- data.frame(id = c(1:5), 
pets = c("brown dog", "black cat", "orange cat", "black bird", "white hamster"), 
color = c("brown", "black", "orange", "black", "white"))

r string dictionary group-by match
1个回答
0
投票

使用

stringr
包:

library(stringr)

regex <- str_c("\\b", dictionary, "\\b", collapse = "|")
color <- str_extract(df$pets, regex)
# "brown"  "black"  "orange" "black"  "white" 

在基础 R 中:

regex <- paste0(".*(", paste0("\\b", dictionary, "\\b", collapse = "|"), ").*")

color <- sub(regex, "\\1", df$pets)
# "brown"  "black"  "orange" "black"  "white" 
© www.soinside.com 2019 - 2024. All rights reserved.