如何在R中的同一数据帧中的另一列中找到一个单词的位置?

问题描述 投票:1回答:2

我在数据帧序列中有两列:

主题|关键字


盒子很漂亮|盒子 及时交货|交货 他们服务很好

如何在主题中找到关键词的位置?

目前,我正在使用for循环:

for(k in 1:nrow(train)){
l <- unlist(gregexpr(train$keyword[k],train$subject[k],ignore.case = T))  
train$position[k] <- l}

还有其他方法吗?

r
2个回答
0
投票

不需要循环,只需使用stringr或stringi包中的locate函数即可。

train <- data.frame(subject = c("the box is beauty", "delivery reached on time", "they serve well"), 
                    keyword = c("box", "delivery", "serve"), 
                    stringsAsFactors = FALSE)


library(stringr)
train$position_stringr <- str_locate(train$subject, train$keyword)[,1]
#locate returns a matrix and we are just interested in the start of keyword.

library(stringi)
train$position_stringi <- stri_locate_first(train$subject, regex = train$keyword)[,1]
#locate returns a matrix and we are just interested in the start of keyword.

train
                   subject  keyword position_stringr position_stringi
1        the box is beauty      box                5                5
2 delivery reached on time delivery                1                1
3          they serve well    serve                6                6

0
投票

你可以使用下面的内容。

#data.frame created using the below statements
Subject <- c("the box is beauty","delivery reached on time","they serve well")
Keyword <- c("box","delivery","serve")
train <- data.frame(Subject,Keyword)


#Solution
library(stringr)
for(k in 1:nrow(train))
{
  t1 <- as.character(train$Subject[k])
  t2 <- as.character(train$Keyword[k])
  locate_vector <- str_locate(t1,regex(t2,ignore.case=true))[[1]]
  train$start_position[k] <- locate_vector
  #If end position is also required, the second column from str_locate 
  #function could be used.

}

© www.soinside.com 2019 - 2024. All rights reserved.