meth.genes
是一个字符向量,基于与 meth
数据框的行名相对应的基因名称。然而,它是一对多匹配,一个基因可以映射到meth
中的多个行名。
经过一系列下游分析,我将这些基因名称与另一组 ID(合奏 ID)一对一匹配,我现在想将这些合奏 ID 匹配回 meth
的行名,我在这里想要添加/粘贴增量值(例如,“.1”,“.2”)到行名,如果它是重复的。
meth.genes <- genes.mapped$nearestGeneSymbol
bm.meth <- getBM(attributes = c("ensembl_gene_id", "hgnc_symbol"),
filters = "hgnc_symbol", mart = ensembl, values=meth.genes)
idx.meth <- meth.genes %>% match(table = bm.meth$hgnc_symbol)
meth.ensembl <- bm.meth$ensembl_gene_id[bm.meth$hgnc_symbol %in% meth.genes]
rownames(meth) <- meth.ensembl
回溯:
Error in `.rowNamesDF<-`(x, value = value) : invalid 'row.names' length
类似的东西:
for i in length(duplicated(meth.ensembl)) {
paste0(rownames(meth), ".", i)
}
资料:
> dput(meth[1:3,1:3])
structure(list(TCGA.2K.A9WE.01A = c(0.461440642939772, 0.143910373119058,
0.847164847154162), TCGA.2Z.A9J1.01A = c(0.595894468074615, 0.0807243779293262,
0.867305510246114), TCGA.2Z.A9J3.01A = c(0.553849599144766, 0.0642332527783939,
0.917290578229414)), row.names = c("cg00000029", "cg00000165",
"cg00000236"), class = "data.frame")
genes.mapped
> dput(genes.mapped[1:3,])
structure(list(queryHits = 1:3, subjectHits = c(17721L, 11282L,
20626L), distance = c(237L, 11879L, 0L), nearestGeneSymbol = c("RBL2",
"BARHL2", "VDAC3")), row.names = c("cg00000029", "cg00000165",
"cg00000236"), class = "data.frame")
idx.meth
> dput(idx.meth[1:3])
c(3185L, 361L, 4196L)
meth.genes
> dput(meth.genes[1:3])
c("RBL2", "BARHL2", "BARHL2")
dput(meth.ensembl[1:3])
c("ENSG00000175899", "ENSG00000184389", "ENSG00000184389")
预期产出:
c("ENSG00000175899", "ENSG00000184389.1", "ENSG00000184389.2")
我们可以为此使用
ave
ave(meth.ensemble, meth.genes,
FUN = function(z) if (length(z) == 1) z else paste(z, seq_along(z), sep = "."))
# [1] "ENSG00000175899" "ENSG00000184389.1" "ENSG00000184389.2"
数据
meth.genes <- c("RBL2", "BARHL2", "BARHL2")
meth.ensemble <- c("ENSG00000175899", "ENSG00000184389", "ENSG00000184389")
您可以使用
rowid()
的粘贴 meth.genes
,在分组框架内:
library(data.table)
f <- \(e) if(length(e)>1) paste0(e,".", rowid(e)) else e
data.table(meth.genes, meth.ensemble)[,meth.ensemble:=f(meth.ensemble),meth.genes][]
输出:
meth.genes meth.ensemble
1: RBL2 ENSG00000175899
2: BARHL2 ENSG00000184389.1
3: BARHL2 ENSG00000184389.2
然后可以将其加入
meth