我有一个包含序列的fasta文件。我只想提取标题信息并显示它。
我是python编码的新手
# The with open will open the file using "f" as the file handle.
with open("/home/rightmire/Downloads/fastafile", "r") as f:
for line in f: # Creates a for loop to read the file line by line
print(line) # This is the first line
# If you comment out the break, the file will continue to be read line by line
# If you want just the first line, you can break the loop
break
# even though the loop has ended, the last contents of the variable 'line' is remembered
print("The data retained in the variable 'line' is: ", line)
输出:===
>gi|186681228|ref|YP_001864424.1| phycoerythrobilin:ferredoxin oxidoreductase The data retained in the variable 'line' is: >gi|186681228|ref|YP_001864424.1| phycoerythrobilin:ferredoxin oxidoreductase
您也可以选择不使用循环或'with'。
f = open("/home/rightmire/Downloads/fastafile", "r") line = f.readline() # reads one line print(line) f.close() # Closes the open file.
===
最后,您可以将整个文件读入内存,在这里您可以将整个文件作为一个整体进行操作,可以对各个行进行操作,甚至可以逐个字符地解析文件。但是,这可能不是最好的主意,因为文件可能很大!
# The with open will open the file using "f" as the file handle. f = open("/home/rightmire/Downloads/fastafile", "r") # Read the entire file into the variable 'lines' lines = f.read() # Split 'lines' by the newline character to get individual lines. for line in lines.split("\n"): print("--------") print(line) # or even read it out character by character, which can be handy for parsing the genome data. for c in lines: print(c)
输出:
--------
>gi|186681228|ref|YP_001864424.1| phycoerythrobilin:ferredoxin oxidoreductase
--------
MNSERSDVTLYQPFLDYAIAYMRSRLDLEPYPIPTGFESNSAVVGKGKNQEEVVTTSYAFQTAKLRQIRA
--------
AHVQGGNSLQVLNFVIFPHLNYDLPFFGADLVTLPGGHLIALDMQPLFRDDSAYQAKYTEPILPIFHAHQ
--------
QHLSWGGDFPEEAQPFFSPAFLWTRPQETAVVETQVFAAFKDYLKAYLDFVEQAEAVTDSQNLVAIKQAQ
--------
LRYLRYRAEKDPARGMFKRFYGAEWTEEYIHGFLFDLERKLTVVK
--------
>
g
i
|
1
(snip)
M
N
S
E
(snip)