我正在 fasta 文件中的序列中寻找氨基酸。我正在检查序列中的第 564 个氨基酸是否是 V、I 或 E。正在访问的文件被读入并作为另一个函数的一部分进行处理。代码如下:
with open("/Users/adia/Downloads/FGFR_fusion.fasta",'r') as fasta:
test = fasta.read()
test = test.split(">")
del test[0]
out = open("/Users/adia/Desktop/PSB HW_6.txt", 'w')
for x in test:
if 'fibroblast growth factor receptor 1 isoform' in x:
out.write(x)
with open("/Users/adia/Desktop/PSB HW_6.txt", 'r') as filtered:
test2 = filtered.read()
out = open("/Users/adia/Desktop/564ID.txt", 'w')
** AA564 = re.compile('^M.{562}[IVE]')**
matches = re.finditer(AA564,test2)
found = []
for match in matches:
found.append(match.group(0))
print(f"Found {match.group(0)} at position {match.start()}.\n")
if found == []:
print("No match found.")
No match found.
我知道至少有一个匹配项(我自己阅读并分离了每个序列),但无论出于何种原因,这都行不通。
我尝试了这个正则表达式的很多变体:
这可能与第一次写出文件时的格式有关,因为它是我遇到问题的部分的输入。但我检查过,当它读出并返回时,两次它都是一个字符串。我无法进一步编辑它,否则我无法使用正则表达式。我尝试使用 regex101 来查看差异在哪里,但没有什么是明显的。任何人都可以提供有关如何解决此问题的见解吗? 蒂亚!
输入
FGFR_fusion.fasta
:
>fibroblast growth factor receptor 1 isoform
AAAAAAAAAE
带有代码:
import re
with open("FGFR_fusion.fasta",'r') as fasta:
test = fasta.read()
print('test : ', test)
test2 = test.split(">")[1].split('\n')[1]
print('test2 : ', test2)
# del test[1]
print('test2 : ', test2)
out = open("PSB HW_6.txt", 'w')
for x in test.split(">"):
print('x : ', x)
if 'fibroblast growth factor receptor 1 isoform' in x:
print('test2 : ', test2)
out.write(test2)
out.close()
with open("PSB HW_6.txt", 'r') as filtered:
test2 = filtered.read()
print('test2 : ', test2)
out = open("10ID.txt", 'w')
AA10 = re.compile('^.{9}[IVE]')
matches = re.finditer(AA10,test2)
print('matches : ', matches)
found = []
for match in matches:
print('match.group() : ', match.group())
found.append(match.group())
print(f"Found {match.group(0)} at position {match.start()}\n")
out.write(match.group()[9])
out.close()
if found == []:
print("No match found.")
输出:
test : >fibroblast growth factor receptor 1 isoform
AAAAAAAAAE
test2 : AAAAAAAAAE
test2 : AAAAAAAAAE
x :
x : fibroblast growth factor receptor 1 isoform
AAAAAAAAAE
test2 : AAAAAAAAAE
test2 : AAAAAAAAAE
matches : <callable_iterator object at 0x7fd97064cb50>
match.group() : AAAAAAAAAE
Found AAAAAAAAAE at position 0
和文件:
PSB HW_6.txt
:AAAAAAAAAE
;
10ID.txt
:E