我有一组字符串,每个字符串都有一个字符“ X”
c("KGDDQSXQGGAPDAGQE", "TEEDSEEVXEQK", "LTXTSGETTQTHTEPTGDSK", "IXTHNSEVEEDDMDK", "SXENPEEDEDQRNPAK", "XTAEHEAAQQDLQSK", "ATVIXHGETLRRTK", "XAVAREESGKPGAHVTVK", "YHTINGHNAEVXK", "XAAEDDEDDDVDTK")
我想获得一个字符向量,每个元素具有11个字符,字符串的中心为“ X”,并且字符串的每一侧有5个字符。如果两边之一上没有5个字符,则改为添加“ x”。
例如“ KGDDQSXQGGAPDAGQE”,成为“ GDDQSXQGGAP”]
“” TEEDSEEVXEQK“,成为” DSEEVXEQKxx“
“” LTXTSGETTQTHTEPTGDSK“,成为” xxxLTXTSGET“
假定字符必须包含"X"
。
strsplit
与X
。现在有两个分开的字符串。称他们为Left
,Right
Left
的长度短于5
,则添加x
。与Right
相同Left
的长度大于5
,则仅选择5
字符。与Right
相同paste
他们。vec <- c("KGDDQSXQGGAPDAGQE", "TEEDSEEVXEQK",
"LTXTSGETTQTHTEPTGDSK", "IXTHNSEVEEDDMDK",
"SXENPEEDEDQRNPAK", "XTAEHEAAQQDLQSK",
"ATVIXHGETLRRTK", "XAVAREESGKPGAHVTVK",
"YHTINGHNAEVXK", "XAAEDDEDDDVDTK")
myf <- function(v){
v <- unlist(strsplit(v,'X'))
Left <- v[1]
Right <- v[2]
if(nchar(Left)>5){
Left <- substr(Left, nchar(Left)-4 ,nchar(Left))
}
else{
Left <- paste0(paste0(rep('x', 5-nchar(Left)),collapse = ''), Left, collapse = '')
}
if(nchar(Right)>5){
Right <- substr(Right, 1, 5)
}
else{
Right <- paste0(Right, paste0(rep('x', 5-nchar(Right)), collapse = '') , collapse = '')
}
paste0(Left,'X',Right,collapse = '')
}
sapply(vec, myf)
这里是一个选项
charcenter <- function(input, char="X", pad=5) {
index <- regexpr("X", input, fixed=TRUE)
extr <- substr(input, pmax(0, index-pad), pmin(index+pad, nchar(input)))
padl <- strrep("x", pmax(pad-index+1, 0))
padr <- strrep("x", pmax(pad-nchar(input)+index, 0))
paste0(padl, extr, padr)
}
[我们使用正则表达式找到第一个X,然后使用substr()
提取尽可能多的X,然后找出需要多少填充以及每一边并将其添加。
如果我将其与输入结合起来,这就是结果
input <- c("KGDDQSXQGGAPDAGQE", "TEEDSEEVXEQK", "LTXTSGETTQTHTEPTGDSK", "IXTHNSEVEEDDMDK", "SXENPEEDEDQRNPAK", "XTAEHEAAQQDLQSK", "ATVIXHGETLRRTK", "XAVAREESGKPGAHVTVK", "YHTINGHNAEVXK", "XAAEDDEDDDVDTK")
cbind(charcenter(input), input)
# input
# [1,] "GDDQSXQGGAP" "KGDDQSXQGGAPDAGQE"
# [2,] "DSEEVXEQKxx" "TEEDSEEVXEQK"
# [3,] "xxxLTXTSGET" "LTXTSGETTQTHTEPTGDSK"
# [4,] "xxxxIXTHNSE" "IXTHNSEVEEDDMDK"
# [5,] "xxxxSXENPEE" "SXENPEEDEDQRNPAK"
# [6,] "xxxxxXTAEHE" "XTAEHEAAQQDLQSK"
# [7,] "xATVIXHGETL" "ATVIXHGETLRRTK"
# [8,] "xxxxxXAVARE" "XAVAREESGKPGAHVTVK"
# [9,] "HNAEVXKxxxx" "YHTINGHNAEVXK"
# [10,] "xxxxxXAAEDD" "XAAEDDEDDDVDTK"