CSV 文件输出的列格式错误

问题描述 投票:0回答:1

我的代码旨在转换以下格式的文本文件:

> gene name
gene sequence

一个 csv 文件,其中每行中 1 列有基因名称,其他列有基因序列。 然而,一些基因序列会转移到下一行。

example image here

如何防止这种情况发生?

 def parse_gene_file(input_file):
    genes = []
    with open(input_file, 'r') as file:
        gene_name = None
        gene_sequence = []
        for line in file:
            line = line.strip()
            if line.startswith('>'):
                # It means we have just finished reading the sequence of a gene.
                if gene_name is not None:
                    genes.append((gene_name, ''.join(gene_sequence)))
                    print(f"Added gene: {gene_name} with sequence: {''.join(gene_sequence)}")
                gene_name = line[1:].strip() # Set gene_name to the new gene name (strip the > and any leading/trailing whitespace)
                gene_sequence = []
            else:
                gene_sequence.append(line)
        if gene_name is not None:
            genes.append((gene_name, ''.join(gene_sequence))) # ensure the last gene is added to the genes list.
            print(f"Added gene: {gene_name} with sequence: {''.join(gene_sequence)}")
    return genes

def write_to_csv(genes, output_file):
    with open(output_file, 'w', newline='') as csvfile:
        writer = csv.writer(csvfile)
        writer.writerow(['Gene Name', 'Gene Sequence'])
        writer.writerows(genes)

def main(input_file, output_file):
    genes = parse_gene_file(input_file)
    write_to_csv(genes, output_file)

if __name__ == "__main__":
    input_file = r'C:txt'  # input file name
    output_file = r'C:csv'  # output file name
    main(input_file, output_file)
python csv file export-to-csv
1个回答
0
投票

你不能。 CSV 文件不维护列宽信息。在 Excel 中双击 A 列和 B 列之间的线,然后双击 B 列和 C 列之间的线,以根据内容调整宽度。

作为替代方案,使用支持 XLSX 输出的包并在其中指定列宽。

© www.soinside.com 2019 - 2024. All rights reserved.