在R中,我有一个公司列表,例如:
companies <- data.frame(Name=c("Company A Inc (COMPA)","Company B (BEELINE)", "Company C Inc. (Coco)", "Company D Inc.", "Company E"))
我想用括号删除文本,最后得到以下列表:
Name
1 Company A Inc
2 Company B
3 Company C Inc.
4 Company D Inc.
5 Company E
我尝试过的一种方法是拆分字符串,然后使用ldply:
companies$Name <- as.character(companies$Name)
c<-strsplit(companies$Name, "\\(")
ldply(c)
但是因为并非所有公司名称都有括号部分,所以它失败了:
Error in list_to_dataframe(res, attr(.data, "split_labels"), .id, id_as_factor) :
Results do not have equal lengths
我没有和strsplit解决方案结婚。无论删除该文本和括号都没关系。
一个gsub
应该在这里工作
gsub("\\s*\\([^\\)]+\\)","",as.character(companies$Name))
# [1] "Company A Inc" "Company B" "Company C Inc."
# [4] "Company D Inc." "Company E"
在这里,我们只用“(...)”替换出现的东西(也删除任何前导空格)。 R使得它看起来比我们必须为括号所做的所有转义更糟糕,因为它们是正则表达式中的特殊字符。
你可以使用stringr::str_replace
。这很好,因为它接受因子变量。
companies <- data.frame(Name=c("Company A Inc (COMPA)","Company B (BEELINE)",
"Company C Inc. (Coco)", "Company D Inc.",
"Company E"))
library(stringr)
str_replace(companies$Name, " \\(.*\\)", "")
# [1] "Company A Inc" "Company B" "Company C Inc."
# [4] "Company D Inc." "Company E"
如果你仍然想使用strsplit
,你可以做到
companies$Name <- as.character(companies$Name)
unlist(strsplit(companies$Name, " \\(.*\\)"))
# [1] "Company A Inc" "Company B" "Company C Inc."
# [4] "Company D Inc." "Company E"
你也可以使用:
library(qdap)
companies$Name <- genX(companies$Name, " (", ")")
companies
Name
1 Company A Inc
2 CompanyB
3 Company C Inc.
4 Company D Inc.
5 CompanyE
library(qdap)
bracketX(companies$Name) -> companies$Name