这里的Python代码为我提供了我想要的输出。但是,我需要帮助将结果限制为前20行。
输入示例如下所示,
gi | 170079688 | ref | YP_001729008.1 |双功能核黄素激酶/ FMN腺苷酸转移酶[大肠杆菌str。 K-12 substr。 DH10B] MKLIRGIHNLSQAPQEGCVLTIGNFDGVHRGHRALLQGLQEEGRKRNLPVMVMLFEPQPLELFATDKAPA RLTRLREKLRYLAECGVDYVLCVRFDRRFAALTAQNFISDLLVKHLRVKFLAVGDDFRFGAGREGDFLLL QKAGMEYGFDITSTQTFCEGGVRISSTAVRQALADDNLALAESLLGHPFAISGRVVHGDELGRTIGFPTA NVPLRRQVSPVKGVYAVEVLGLGEKPLPGVANIGTRPTVAGIRQQLEVHLLDVAMDLYGRHIQVVLRKKI RNEQRFASLDELKAQIARDELTAREFFGLTKPA gi | 170079689 | ref | YP_001729009.1 |异亮氨酰-tRNA合成酶[大肠杆菌str。 K-12 substr。 DH10B] MSDYKSTLNLPETGFPMRGDLAKREPGMLARWTDDDLYGIIRAAKKGKKTFILHDGPPYANGSIHIGHSV NKILKDIIVKSKGLSGYDSPYVPGWDCHGLPIELKVEQEYGKPGEKFTAAEFRAKCREYAATQVDGQRKD FIRLGVLGDWSHPYLTMDFKTEANIIRALGKIIGNGHLHKGAKPVHWCVDCRSALAEAEVEYYDKTSPSI DVAFQAVDQDALKAKFAVSNVNGPISLVIWTTTPWTLPANRAISIAPDFDYALVQIDGQAVILAKDLVES VMQRIGVTDYTILGTVKGAELELLRFTHPFMGFDVPAILGDHVTLDAGTGAVHTAPGHGPDDYVIGQKYG LETANPVGPDGTYLPGTYPTLDGVNVFKANDIVVALLQEKGALLHVEKMQHSYPCCWRHKTPIIFRATPQ WFVSMDQKGLRAQSLKEIKGVQWIPDWGQARIESMVANRPDWCISRQRTWGVPMSLFVHKDTEELHPRTL ELMEEVAKRVEVDGIQAWWDLDAKEILGDEADQYVKVPDTLDVWFDSGSTHSSVVDVRPEFAGHAADMYL EGSDQHRGWFMSSLMISTAMKGKAPYRQVLTHGFTVDGQGRKMSKSIGNTVSPQDVMNKLGADILRLWVA STDYTGEMAVSDEILKRAADSYRRIRNTARFLLANLNGFDPAKDMVKPEEMVVLDRWAVGCAKAAQEDIL KAYEAYDFHEVVQRLMRFCSVEMGSFYLDIIKDRQYTAKADSVARRSCQTALYHIAEALVRWMAPILSFT ADEVWGYLPGEREKYVFTGEWYEGLFGLADSEAMNDAFWDELLKVRGEVNKVIEQARADKKVGGSLEAAV TLYAEPELSAKLTALGDELRFVLLTSGATVADYNDAPADAQQSEVLKGLKVALSKAEGEKCPRCWHYTQD VGKVAEHAEICGRCVSNVAGDGEKRKFA gi | 170079690 | ref | YP_001729010.1 |脂蛋白信号肽酶[大肠杆菌str。 K-12 substr。 DH10B] MSQSICSTGLRWLWLVVVVIDIDLGSKYLILQNFALGDTVPLFPSLNLHYARNYGAAFSFLADSGGWQRW FFAGIAIGISVILAVMMYRSKATQKLNNIAYALIIGGALGNLFDRLWHGFVVDMIDFYVGDWHFATFNLA DTAICVGAALIVLEGFLPSRAKKQ
import re
id = None
header = None
seq = ''
a_file = open('e_coli.faa')
for line in a_file:
m = re.match(">(\S+)\s+(.+)", line.rstrip())
if m:
if id is not None:
print("{0} length:{1} {2}".format(id, len(seq),header))
id, header = m.groups()
seq = ''
else:
seq += line.rstrip()
在最顶部,添加c = 0
。然后,更改
print("{0} length:{1} {2}".format(id, len(seq),header))
to
if c < 10:
print("{0} length:{1} {2}".format(id, len(seq),header))
c += 1
读取文件的前20行。您可以使用readlines()
:
而不是:
for line in a_file:
用途:
for line in a_file.readlines()[:20]:
使用切片运算符:
with open('e_coli.faa', 'r') as f:
content = f.readlines()
for line in content[:20]:
print(line)
使用enumerate()
for idx, line in enumerate(a_file):
if idx < 20:
m = re.match(">(\S+)\s+(.+)", line.rstrip())
if m:
if id is not None:
print("{0} length:{1} {2}".format(id, len(seq),header))
id, header = m.groups()
seq = ''
else:
seq += line.rstrip()
else:
break