如何可视化两个序列的完全对齐?
library(Biostrings)
s1 <-DNAString("ACTTCACCAGCTCCCTGGCGGTAAGTTGATCAAAGGAAACGCAAAGTTTTCACTTCACCAGCTCCCTGGCGGTAAGTTGATCAAAGGAAACGCAAAGTTTTCAAGAAGACTTCACCAGCTCCCTGGCGGTAAGTTGATCAAAGGAAACGCAAAGTTTTCAAG")
s2 <-DNAString("GTTTCACTACTTCCTTTCGGGTAAGTAAATATATGTTTCACTACTTCCTTTCGGGTAAGTGTTTCACTACTTCCTTTCGGGTAAGTAAATATATAAATATATAAAAATATAATTTTCATCAAATATATAAATATATAAAAATATAATTTTCATCAAATATATAAAAATATAATTTTCATC")
pairwiseAlignment(s1,s2)
输出:
Global PairwiseAlignmentsSingleSubject (1 of 1)
pattern: [1] ACTTCACCAGCTCCCTGGCGGTAAGTTGATCAAAGGAAACGCAAAGT--TTTCAC---...CTTCACCAGCTCCCTGGCGGTAAGTTG-ATCAAAGG---AAACGCAAAGTTTTCAAG
subject: [1] GTTTCACTACTTCCTTTCGGGTAAGTAAAT-ATATGTTTCACTACTTCCTTTCGGGTA...TATATAAATATATAAAAATATAATTTTCATCAAATATATAAAAATATAATTTTCATC
score: -394.7115
这里只显示了一部分对齐?您知道绘制或打印路线的任何现有功能吗?
您可以在?pairwiseAlignments
下找到有关如何提取对齐模式和主题序列的信息和详细信息。
以下是基于您提供的示例数据的示例:
PairwiseAlignmentsSingleSubject
对象中
alg <- pairwiseAlignment(s1,s2)
DNAStringSet
对象中。
seq <- c(alignedPattern(alg), alignedSubject(alg))
as.character
访问完整序列
as.character(seq)
[1] "ACTTCACCAGCTCCCTGGCGGTAAGTTGATCAAAGGAAACGCAAAGT--TTTCAC--------TTCACCAGCTCCCTGGCGGTAAGTTGATC---AAAGG---AAACGCAAAGTTTTCAAGAAGACTTCACCAGCTCCCTGGCGGTAAGTTG-ATCAAAGG---AAACGCAAAGTTTTCAAG"
[2] "GTTTCACTACTTCCTTTCGGGTAAGTAAAT-ATATGTTTCACTACTTCCTTTCGGGTAAGTGTTTCACTACTTCCTTTCGGGTAAGTAAATATATAAATATATAAAAATATAATTTTCATCAA-ATATATAAATATATAAAAATATAATTTTCATCAAATATATAAAAATATAATTTTCATC"
似乎alignedPattern
和alignedSubject
最近被添加到Biostrings
。或者你可以做
seq <- c(aligned(pattern(alg)), aligned(subject(alg)))
但请注意,这将修剪全局对齐的序列(请参阅details)。DECIPHER
,它提供了一种在Web浏览器中可视化XStringSet
数据的方法。它会在底部自动添加颜色编码和共识序列。在你的情况下,你会这样做
library(DECIPHER)
BrowseSeqs(seq)