我正在尝试创建一个规则,为每个唯一的字符串分配一个特定的颜色代码,以便在ggplot2中为不同的文件进行绘图。例如,如果我有两个制表符分隔文件,file1.txt和file2.txt看起来像这样:
FILE1.TXT
Freq Seq
90 AAGTGT
3 AAGTGG
3 AAGTCC
2 AATTTT
2 TTTTTT
FILE2.TXT
Freq Seq
91 AAGTGT
4 AAGTGG
2 AAGTCC
2 CCCCCC
1 TTTTTT
对于6种不同的序列(AAGTGT,AAGTGG,AAGTCC,CCCCCC,TTTTTT,AATTTT),总共有6种不同的颜色将用于上述文件。在我的许多文件中,我有~3000种颜色,我已经创建了一个调色板(pal
)供使用
pal<-c(randomColor(count=2951))
有没有一种方法可以确保我的许多文件中的所有序列都保持有序的字符串对和相应的十六进制颜色代码(即显示AAGTGT序列的所有文件将具有相同的该字符串的十六进制颜色代码)?值得注意的是,并非所有3000种颜色都在每个文件中表示。
谢谢!
希望这可以帮助!
library(ggplot2)
library(randomcoloR)
#build a pallete mapping using 'Seq' column's value in all available dataframes
set.seed(123)
pal <- c(randomColor(count=6))
pal_seq_mapping <- data.frame(sequence=unique(c(as.character(df1$Seq),as.character(df2$Seq))), color=pal)
#example plot on 'df1' dataframe
ggplot(df1, aes(x=Seq, y=Freq)) +
geom_bar(stat="identity", fill=pal_seq_mapping[match(df1$Seq, pal_seq_mapping$sequence),"color"]) +
theme_bw()
#example plot on 'df2' dataframe
ggplot(df2, aes(x=Seq, y=Freq)) +
geom_bar(stat="identity", fill=pal_seq_mapping[match(df2$Seq, pal_seq_mapping$sequence),"color"]) +
theme_bw()
#sample data
> dput(df1)
structure(list(Freq = c(90L, 3L, 3L, 2L, 2L), Seq = structure(c(3L,
2L, 1L, 4L, 5L), .Label = c("AAGTCC", "AAGTGG", "AAGTGT", "AATTTT",
"TTTTTT"), class = "factor")), .Names = c("Freq", "Seq"), class = "data.frame", row.names = c(NA,
-5L))
> dput(df2)
structure(list(Freq = c(91L, 4L, 2L, 2L, 1L), Seq = structure(c(3L,
2L, 1L, 4L, 5L), .Label = c("AAGTCC", "AAGTGG", "AAGTGT", "CCCCCC",
"TTTTTT"), class = "factor")), .Names = c("Freq", "Seq"), class = "data.frame", row.names = c(NA,
-5L))