删除R中包含冒号的字符串

Question

这是我的数据集的示例摘录。它看起来如下：

Description;ID;Date
wa119:d Here comes the first row;id_112;2018/03/02
ax21:3 Here comes the second row;id_115;2018/03/02
bC230:13 Here comes the third row;id_234;2018/03/02

我想删除那些包含冒号的单词。在这种情况下，这将是wa119：d，ax21：3和bC230：13，以便我的新数据集应如下所示：

Description;ID;Date
Here comes the first row;id_112;2018/03/02
Here comes the second row;id_115;2018/03/02
Here comes the third row;id_234;2018/03/02

不幸的是，我无法使用gsub找到正则表达式/解决方案？有人可以帮忙吗？

Answer 1

这是一种方法：

## reading in yor data
dat <- read.table(text ='
Description;ID;Date
wa119:d Here comes the first row;id_112;2018/03/02
ax21:3 Here comes the second row;id_115;2018/03/02
bC230:13 Here comes the third row;id:234;2018/03/02
', sep = ';', header = TRUE, stringsAsFactors = FALSE)

## \\w+ = one or more word characters
gsub('\\w+:\\w+\\s+', '', dat$Description)

## [1] "Here comes the first row"  
## [2] "Here comes the second row"
## [3] "Here comes the third row"

关于\\w的更多信息，这是一个与[A-Za-z0-9_]相同的速记字符类：https://www.regular-expressions.info/shorthand.html

Answer 2

假设您要修改的列是dat：

dat <- c("wa119:d Here comes the first row",
         "ax21:3 Here comes the second row",
         "bC230:13 Here comes the third row")

然后你可以把每个元素，分成单词，删除包含冒号的单词，然后将左边的内容粘贴在一起，产生你想要的东西：

dat_colon_words_removed <- unlist(lapply(dat, function(string){
  words <- strsplit(string, split=" ")[[1]]
  words <- words[!grepl(":", words)]
  paste(words, collapse=" ")
}))

Answer 3

另一个与OP的预期结果完全匹配的解决方案可能是：

#data
df <- read.table(text = "Description;ID;Date
wa119:d Here comes the first row;id_112;2018/03/02
ax21:3 Here comes the second row;id_115;2018/03/02
bC230:13 Here comes the third row;id:234;2018/03/02", stringsAsFactors = FALSE, sep="\n")

gsub("[a-zA-Z0-9]+:[a-zA-Z0-9]+\\s", "", df$V1)

#[1] "Description;ID;Date"                        
#[2] "Here comes the first row;id_112;2018/03/02" 
#[3] "Here comes the second row;id_115;2018/03/02"
#[4] "Here comes the third row;id:234;2018/03/02"

删除R中包含冒号的字符串

问题描述投票：4回答：3

3个回答

最新问题

删除R中包含冒号的字符串

问题描述 投票：4回答：3

3个回答

最新问题

问题描述投票：4回答：3