我的代码旨在转换以下格式的文本文件:
> gene name
gene sequence
一个 csv 文件,其中每行中 1 列有基因名称,其他列有基因序列。 然而,一些基因序列会转移到下一行。
如何防止这种情况发生?
def parse_gene_file(input_file):
genes = []
with open(input_file, 'r') as file:
gene_name = None
gene_sequence = []
for line in file:
line = line.strip()
if line.startswith('>'):
# It means we have just finished reading the sequence of a gene.
if gene_name is not None:
genes.append((gene_name, ''.join(gene_sequence)))
print(f"Added gene: {gene_name} with sequence: {''.join(gene_sequence)}")
gene_name = line[1:].strip() # Set gene_name to the new gene name (strip the > and any leading/trailing whitespace)
gene_sequence = []
else:
gene_sequence.append(line)
if gene_name is not None:
genes.append((gene_name, ''.join(gene_sequence))) # ensure the last gene is added to the genes list.
print(f"Added gene: {gene_name} with sequence: {''.join(gene_sequence)}")
return genes
def write_to_csv(genes, output_file):
with open(output_file, 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(['Gene Name', 'Gene Sequence'])
writer.writerows(genes)
def main(input_file, output_file):
genes = parse_gene_file(input_file)
write_to_csv(genes, output_file)
if __name__ == "__main__":
input_file = r'C:txt' # input file name
output_file = r'C:csv' # output file name
main(input_file, output_file)
你不能。 CSV 文件不维护列宽信息。在 Excel 中双击 A 列和 B 列之间的线,然后双击 B 列和 C 列之间的线,以根据内容调整宽度。
作为替代方案,使用支持 XLSX 输出的包并在其中指定列宽。