我希望能够获取包含列df $ col的数据帧df,其列表包含:
I?m tired
You?re tired
You?re tired?
Are you tired?
?I am tired
并替换在带有撇号的字母和出现在字符串开头的问号之间出现的问号:
I'm tired
You're tired
You're tired?
Are you tired?
I am tired
我会在开头使用sub
作为问号,为其他人使用gsub
,因为字符串中的单词之间可能有几个问号但开头只有一个。
gsub("(\\w)\\?(\\w)", "\\1'\\2", sub("^\\?", "", df$col))
[1] "I'm tired" "You're tired" "You're tired?" "Are you tired?"
[5] "I am tired"
有关解释,请参阅https://regex101.com/r/jClVPg/1。
一些解释:
我们可以使用sub
df$col <- sub("^'", "", sub("[?](?!$)", "'", df$col, perl = TRUE))
df$col
#[1] "I'm tired" "You're tired" "You're tired?" "Are you tired?" "I am tired"
在这里,我们假设将有一个?
,如示例中所示。否则,只需用sub
替换内部gsub
df <- structure(list(col = c("I?m tired", "You?re tired", "You?re tired?",
"Are you tired?", "?I am tired")), .Names = "col",
class = "data.frame", row.names = c(NA, -5L))