根据 R 中的变量单词将文本列分成两部分

问题描述 投票:0回答:1

我有这个数据框

df <- data.frame(ID = c(1,2,3),
text = c("A big basket of fruits having apples, green bananas, and peaches",
"A small basket of fruits having green bananas, apples, and peaches",
"A red colored basket of fruits having peaches, green bananas, and apples"),
splitter = c("apples", "bananas", "peaches"))

我想根据“splitter”变量将“text”列拆分为两个新列(“preamble”和“body”)。拆分器之前的所有内容都将位于“序言”列中,而其后的所有内容都将位于“正文”列中。结果应该是:

df$preamble <- c("A basket of fruits having", "A small basket of fruits having", "A red colored basket of fruits having")

我尝试过 sub、gsub、str_split。似乎什么都不起作用。 dplyr 风格的代码将是最有帮助的。

r string strsplit
1个回答
0
投票

这是使用 stringr 中的函数的解决方案。

library(tidyverse)
df <- data.frame(ID = c(1,2,3),
                 text = c("A big basket of fruits having apples, green bananas, and peaches",
                          "A small basket of fruits having green bananas, apples, and peaches",
                          "A red colored basket of fruits having peaches, green bananas, and apples"),
                 splitter = c("apples", "bananas", "peaches"))
df |> mutate(preamble = str_sub(text,start=1,end=str_locate(text,splitter)[,1] -2),
             body = str_sub(text,start=str_locate(text,splitter)[,1]))
#>   ID                                                                     text
#> 1  1         A big basket of fruits having apples, green bananas, and peaches
#> 2  2       A small basket of fruits having green bananas, apples, and peaches
#> 3  3 A red colored basket of fruits having peaches, green bananas, and apples
#>   splitter                              preamble
#> 1   apples         A big basket of fruits having
#> 2  bananas A small basket of fruits having green
#> 3  peaches A red colored basket of fruits having
#>                                 body
#> 1 apples, green bananas, and peaches
#> 2       bananas, apples, and peaches
#> 3 peaches, green bananas, and apples

创建于 2024 年 10 月 19 日,使用 reprex v2.1.1

© www.soinside.com 2019 - 2024. All rights reserved.