我试图根据变体的类型(多态性、插入或删除)对变体进行分类。这是我输入的片段:
A T
G C
T AG
C T
TT C
AT CC
这是 dput 输出,如果您需要的话:
df = class(structure(c("A", "G", "T", "C", "TT", "AT", "T", "C", "AG", "T", "C", "CC"), dim = c(6L, 2L), dimnames = list(NULL, c("REF", "ALT")))
我想要分类为 :
的变体输出应该是这样的:
A T SNV
G C SNV
T AG Insertion
C T SNV
TT C Deletion
AT CC MNV
我已经尝试过这个:
variants = data.frame(
REF = file@fix[, 4],
ALT = file@fix[, 5],
VARIANTTYPE =
if (nchar(file@fix[, 4]) == nchar(file@fix[, 5]))
{ifelse(nchar(file@fix[, 4]) == 1 & nchar(file@fix[, 5]) == 1, "Single nucleotide variant", "Multinucleotide variant")}
else if (nchar(file@fix[, 4]) < nchar(file@fix[, 5])) {"Insertion"}
else if (nchar(file@fix[, 4]) > nchar(file@fix[, 5])) {"Deletion"}
else {"nothing"})
但我收到此错误:
Error in if (nchar(file@fix[, 4]) == nchar(file@fix[, 5])) { :
the condition has length > 1
我在互联网上找到的唯一解决方案是使用 ifelse,但 ifelse 只接受一个条件,我无法一次对所有变体进行分类。
你有什么想法吗?谢谢您的帮助!
您可以使用
case_when
包中的 dplyr
。
像这样:
df <- structure(c("A", "G", "T", "C", "TT", "AT", "T", "C", "AG", "T", "C", "CC"),
dim = c(6L, 2L),
dimnames = list(NULL, c("REF", "ALT")))
df <- as.data.frame(df)
library(dplyr)
library(stringr)
df |>
mutate(VARIANTTYPE = case_when(
# SNP when ALT length = REF length = 1,
str_length(ALT) == str_length(REF) & str_length(ALT) == 1 ~ "SNP",
# MNV when ALT length = REF length,
str_length(ALT) == str_length(REF) ~ "MNV",
# deletion when ALT length < REF length,
str_length(ALT) < str_length(REF) ~ "Deletion",
# and insertion when ALT length > REF length.
str_length(ALT) > str_length(REF) ~ "Insertion",
))
#> REF ALT VARIANTTYPE
#> 1 A T SNP
#> 2 G C SNP
#> 3 T AG Insertion
#> 4 C T SNP
#> 5 TT C Deletion
#> 6 AT CC MNV
创建于 2024-07-22,使用 reprex v2.1.0