我有这个数据框
df <- data.frame(ID = c(1,2,3),
text = c("A big basket of fruits having apples, green bananas, and peaches",
"A small basket of fruits having green bananas, apples, and peaches",
"A red colored basket of fruits having peaches, green bananas, and apples"),
splitter = c("apples", "bananas", "peaches"))
我想根据“splitter”变量将“text”列拆分为两个新列(“preamble”和“body”)。拆分器之前的所有内容都将位于“序言”列中,而其后的所有内容都将位于“正文”列中。结果应该是:
df$preamble <- c("A basket of fruits having", "A small basket of fruits having", "A red colored basket of fruits having")
我尝试过 sub、gsub、str_split。似乎什么都不起作用。 dplyr 风格的代码将是最有帮助的。
这是使用 stringr 中的函数的解决方案。
library(tidyverse)
df <- data.frame(ID = c(1,2,3),
text = c("A big basket of fruits having apples, green bananas, and peaches",
"A small basket of fruits having green bananas, apples, and peaches",
"A red colored basket of fruits having peaches, green bananas, and apples"),
splitter = c("apples", "bananas", "peaches"))
df |> mutate(preamble = str_sub(text,start=1,end=str_locate(text,splitter)[,1] -2),
body = str_sub(text,start=str_locate(text,splitter)[,1]))
#> ID text
#> 1 1 A big basket of fruits having apples, green bananas, and peaches
#> 2 2 A small basket of fruits having green bananas, apples, and peaches
#> 3 3 A red colored basket of fruits having peaches, green bananas, and apples
#> splitter preamble
#> 1 apples A big basket of fruits having
#> 2 bananas A small basket of fruits having green
#> 3 peaches A red colored basket of fruits having
#> body
#> 1 apples, green bananas, and peaches
#> 2 bananas, apples, and peaches
#> 3 peaches, green bananas, and apples
创建于 2024 年 10 月 19 日,使用 reprex v2.1.1