我在数据帧序列中有两列:
主题|关键字
盒子很漂亮|盒子 及时交货|交货 他们服务很好
如何在主题中找到关键词的位置?
目前,我正在使用for循环:
for(k in 1:nrow(train)){
l <- unlist(gregexpr(train$keyword[k],train$subject[k],ignore.case = T))
train$position[k] <- l}
还有其他方法吗?
不需要循环,只需使用stringr或stringi包中的locate函数即可。
train <- data.frame(subject = c("the box is beauty", "delivery reached on time", "they serve well"),
keyword = c("box", "delivery", "serve"),
stringsAsFactors = FALSE)
library(stringr)
train$position_stringr <- str_locate(train$subject, train$keyword)[,1]
#locate returns a matrix and we are just interested in the start of keyword.
library(stringi)
train$position_stringi <- stri_locate_first(train$subject, regex = train$keyword)[,1]
#locate returns a matrix and we are just interested in the start of keyword.
train
subject keyword position_stringr position_stringi
1 the box is beauty box 5 5
2 delivery reached on time delivery 1 1
3 they serve well serve 6 6
你可以使用下面的内容。
#data.frame created using the below statements
Subject <- c("the box is beauty","delivery reached on time","they serve well")
Keyword <- c("box","delivery","serve")
train <- data.frame(Subject,Keyword)
#Solution
library(stringr)
for(k in 1:nrow(train))
{
t1 <- as.character(train$Subject[k])
t2 <- as.character(train$Keyword[k])
locate_vector <- str_locate(t1,regex(t2,ignore.case=true))[[1]]
train$start_position[k] <- locate_vector
#If end position is also required, the second column from str_locate
#function could be used.
}