我有以下类型的
data.frame
id | 文字信息 |
---|---|
1 | 从10%增加到60% |
2 | 购买100% |
3 | 从5%增加到45% |
4 | 购买99% |
我想处理 text_information (
character
) 变量,以便获得以下输出:
id | 分享 | 分享_差异 | 类型 |
---|---|---|---|
1 | 0.6 | 0.5 | 增加 |
2 | 1 | 不适用 | 购买 |
3 | 0.45 | 0.4 | 增加 |
4 | 0.99 | 不适用 | 购买 |
建议如何使用
R
来完成此操作?
使用正则表达式:
library(dplyr)
library(stringr)
data.frame(
id = 1:4,
text_information = c(
"Increase from 10% to 60%",
"Purchase 100%",
"Increase from 5% to 45%",
"Purchase 99%"
)
) %>%
mutate(
share_1 = as.numeric(str_extract(text_information, "(\\d+)%", 1)),
share_2 = as.numeric(str_extract(text_information, "(\\d+)% to (\\d+)%$", 2)),
share = if_else(is.na(share_2), share_1, share_2) / 100,
share_difference = (share_2 - share_1) / 100,
type = tolower(str_extract(text_information, "(Increase|Purchase)"))
) %>%
select(id, share, share_difference, type)
#> id share share_difference type
#> 1 1 0.60 0.5 increase
#> 2 2 1.00 NA purchase
#> 3 3 0.45 0.4 increase
#> 4 4 0.99 NA purchase
创建于 2024-04-08,使用 reprex v2.1.0