如何连接序列以便制作字典?

问题描述 投票:0回答:1

在Python中,我被要求根据字符串(FASTA序列)中的内容创建一个字典,其中字母数字部分(标识符)是键,字母(序列)是值。我首先被要求制作一个标识符列表,因此将行分开以便我可以隔离它们。我现在不知道如何继续连接序列的元素而不丢失序列或标识符的任何部分。该序列未导入,它是由我的教授按原样发布的,因此请不要提出有关更改该序列的建议。谁能帮我弄清楚我应该做什么?然而,它应该非常简单,因为我才刚开始学校的 Python 课程几周。

我尝试在 for 循环中使用 join 命令,但因为我需要它来更新我正在使用的列表,所以我相信它不起作用,因为没有输出。但是也没有错误消息。我尝试用任何内容替换新行来连接组成序列的元素,但是当需要将其与标识符分开时,我会丢失信息(特别是 1 和序列的第一个字母)。我还需要在序列中删除几个新行字符,我无法以任何方式改变原始序列。 我已附上我的代码,但目前没有任何内容可以完成我所询问的任务,因为我必须重新开始。

ESTs ='''>AA417440 1
tattagcctttgtttcgtaatccttactgttaatggtttgttcaataccg
tncttggattttttagccttttcaagtttcttttgaactttcgcaatttc
agcatcaatatcaacgtgtcccttgacgagaagatgtacattgacttctg
ggttaacagattgcaatacgcaaccttctggaatttcggaagcatcacga
acaacagtgacttcgtcgatggccttgatcaacgagacaatagaatcttt
ctgatcttcagcagnttggngnattcctcgtggttagattcaacgaaaac
cttaccattcttcaaantattgttctcagataacaaggaacgagcttctt
tggtantgttcaagancaagtcgtaagcattggtcgatttgacatcancg
tnctcagatacgtaaactggataagaagctttttcaatngagggaggctt
ctcaagtggaacgctttgggnagtcttttggcacatttcttccaga
>AA417441 1
ttagatcatttaatgacctcggagaactgttcctagaattactcctggaa
ctactagcactgttgctagtggtattatttgctgtgcttgaatcaaaaat
cctcaattttttcataaattggccgccaaagccagtattactagagccgt
tgctattgctgtccaagaggtccttcaattcttcgttgtggagttcaatg
aacgaacattttactacgtaatcgttctgttgtaggtccaatgtgtcaaa
caacttcaaaagaaccctcggcattattcctgctggatcg
>AA417442 1
taagtaggttcaaatcaggcactgtcaaagaccgatgcatgatttgaagt
cctcgctgatcaccattggcatagttctgaaatagggtttcggctgactt
tacggagaaatatgaatgaatagatttaccaacaaagaacaaattggtta
ccgcagatatcaaaagacaatagtataaagttttgttaacttttgaaaaa
catttgaagataaccattaccattgccagtaaaactttcatggtcttaat
gatgacactgcttaaataaaacacggtcaattttgtaaaaatttttgaaa
gagacagaaaatcataattgatatatacaggttgcatgaaatcgccggcc
tttncatcgttttctattctttcctttggtaaaacaccaaggaatccaca
caatttaataacatcactcattttttctctatcgttgaatttttttagat
attccctcgtctttcttgttaatttcagaatgtccaagtttacaatatct
caaatgccagttgaggt
>AA417443 1
tatcacaattgcttttttgagaagccaaagagctgattggtgaggttgaa
ggtgtccccacaccttcactttcgattcttctccttttatttggtaaggc
ccacgtcgacgtatcgaatttgtgttttcttgtatccgaggaataatttt
cacgtagaccatatggtacgtcactgctcctacttccagaacttctgctc
tcaacgtcgctgttaccacttgcttctacttcagaaccatcaaaattgcg
aggttcgtttttcacaaatgtgtgccataagtacttacctgagttccaaa
gagtttctcttatgcctcctataaaaccagctttactagctctatctgac
ccatctttcattgattcttccattgataaaacacggcgggcgtttaaata
attgaatcttgacgtattggg
'''

#Converting the string to a list and removing the first empty index and new lines
ESTlist= ESTs.splitlines()
print(ESTlist) #used check that the last command worked

#Creating list of definitions
deflines = []
for x in ESTlist:
    if ">" in x:
        deflines.append(x)

print("Definition Lines:",deflines)

#Creating dictionary from ESTlist

我想要的输出是:

dictionary={ ('\>AA417440 1 ': 'tattagcctttgtttcgtaatccttactgttaatggtttgttcaataccg  
tncttggattttttagccttttcaagtttcttttgaactttcgcaatttcagcatcaatatcaacgtgtcccttgacgagaagatgtacattgacttctgggttaacagattgcaatacgcaaccttctggaatttcggaagcatcacgaacaacagtgacttcgtcgatggccttgatcaacgagacaatagaatctttctgatcttcagcagnttggngnattcctcgtggttagattcaacgaaaaccttaccattcttcaaantattgttctcagataacaaggaacgagcttctttggtantgttcaagancaagtcgtaagcattggtcgatttgacatcancgtnctcagatacgtaaactggataagaagctttttcaatngagggaggcttctcaagtggaacgctttgggnagtcttttggcacatttcttccaga',
'\>AA417441 1' : 'ttagatcatttaatgacctcggagaactgttcctagaattactcctggaactactagcactgttgctagtggtattatttgctgtgcttgaatcaaaaatcctcaattttttcataaattggccgccaaagccagtattactagagccgttgctattgctgtccaagaggtccttcaattcttcgttgtggagttcaatgaacgaacattttactacgtaatcgttctgttgtaggtccaatgtgtcaaacaacttcaaaagaaccctcggcattattcctgctggatcg'
...}

等等。

python conditional-statements
1个回答
0
投票

可能的解决方案之一是使用正则表达式:

import re

out = {}
for k, v in re.findall(r"(>[^\n]+)\n(.*?)(?=>|\Z)", ESTs, flags=re.S):
    out[k] = v.replace("\n", "")

print(out)

打印:

{
    ">AA417440 1": "tattagcctttgtttcgtaatccttactgttaatggtttgttcaataccgtncttggattttttagccttttcaagtttcttttgaactttcgcaatttcagcatcaatatcaacgtgtcccttgacgagaagatgtacattgacttctgggttaacagattgcaatacgcaaccttctggaatttcggaagcatcacgaacaacagtgacttcgtcgatggccttgatcaacgagacaatagaatctttctgatcttcagcagnttggngnattcctcgtggttagattcaacgaaaaccttaccattcttcaaantattgttctcagataacaaggaacgagcttctttggtantgttcaagancaagtcgtaagcattggtcgatttgacatcancgtnctcagatacgtaaactggataagaagctttttcaatngagggaggcttctcaagtggaacgctttgggnagtcttttggcacatttcttccaga",
    ">AA417441 1": "ttagatcatttaatgacctcggagaactgttcctagaattactcctggaactactagcactgttgctagtggtattatttgctgtgcttgaatcaaaaatcctcaattttttcataaattggccgccaaagccagtattactagagccgttgctattgctgtccaagaggtccttcaattcttcgttgtggagttcaatgaacgaacattttactacgtaatcgttctgttgtaggtccaatgtgtcaaacaacttcaaaagaaccctcggcattattcctgctggatcg",
    ">AA417442 1": "taagtaggttcaaatcaggcactgtcaaagaccgatgcatgatttgaagtcctcgctgatcaccattggcatagttctgaaatagggtttcggctgactttacggagaaatatgaatgaatagatttaccaacaaagaacaaattggttaccgcagatatcaaaagacaatagtataaagttttgttaacttttgaaaaacatttgaagataaccattaccattgccagtaaaactttcatggtcttaatgatgacactgcttaaataaaacacggtcaattttgtaaaaatttttgaaagagacagaaaatcataattgatatatacaggttgcatgaaatcgccggcctttncatcgttttctattctttcctttggtaaaacaccaaggaatccacacaatttaataacatcactcattttttctctatcgttgaatttttttagatattccctcgtctttcttgttaatttcagaatgtccaagtttacaatatctcaaatgccagttgaggt",
    ">AA417443 1": "tatcacaattgcttttttgagaagccaaagagctgattggtgaggttgaaggtgtccccacaccttcactttcgattcttctccttttatttggtaaggcccacgtcgacgtatcgaatttgtgttttcttgtatccgaggaataattttcacgtagaccatatggtacgtcactgctcctacttccagaacttctgctctcaacgtcgctgttaccacttgcttctacttcagaaccatcaaaattgcgaggttcgtttttcacaaatgtgtgccataagtacttacctgagttccaaagagtttctcttatgcctcctataaaaccagctttactagctctatctgacccatctttcattgattcttccattgataaaacacggcgggcgtttaaataattgaatcttgacgtattggg",
}
© www.soinside.com 2019 - 2024. All rights reserved.