Python 使用提供的参数和附加键将制表符分隔文件转换为 json

问题描述 投票:0回答:1

我希望将两个列制表符分隔的文件转换为“键:值”并转储为 json 格式,以及一组键和参数

sample_id = 'WGNP1000001'
。这是我的输入和预期输出格式以及代码。感谢任何帮助。谢谢

例如。 制表符分隔的输入文件:WGNP1000001.list.txt

insert_size 447.3
insert_size_std 98.2
pct_properly_paired 97.9
pct_mapped  99.63

预期输出json格式:

{
    "sample": {
        "id": "WGNP1000001"
    },
    "wgs_metrics": {
        "insert_size_std": 98.2,
        "insert_size": 447.3,
        "pct_mapped": 99.63,
        "pct_properly_paired": 97.9
    }
}
sample_id = 'WGNP1000001'`

count_aln.py --input_metrics "${sample_id}.list.txt" --sample_id ${sample_id} --output_json ${sample_id}.metrics.json

代码:

#!/usr/bin/env python3

import argparse
import json
import subprocess
import numpy as np
import sys
import os
from pathlib import Path


def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("--sample_id", dest="sample_id", required=True,
                        default=None,
                        help="Sample ID")
    parser.add_argument("--input_metrics", dest="input_metrics", required=True,
                        default=None,
                        help="Path to input aln metrics list")
    parser.add_argument("--output_json", dest="output_json", required=False,
                        default="./variant_counts.json",
                        help="Path to output file for variant metrics. Default: ./variant_counts.json")
    parser.add_argument("--scratch_dir", dest="scratch_dir", required=False,
                        default="./",
                        help="Path to scratch dir. Default: ./")
    args = parser.parse_args()

    # create scratch dir if it doesn't exist
    Path(args.scratch_dir).mkdir(parents=True, exist_ok=True)

    return args

    def raw_data(input_metrics):
        d = {}
        # d = dict()
        with open(input_metrics) as f:
            rows = ( line.split('\t') for line in f )
            d = { row[0]:row[1] for row in rows }
            return d

def save_output(data_metrics, outfile):
    with open(outfile, "w") as f:
        data_metrics = {"sample" : {"id" : args.sample_id}, "wgs_metrics" : data_metrics}
        json.dump(data_metrics, f, sort_keys=True, indent=4)
        f.write("\n")

if __name__ == "__main__":
    args = parse_args()

    data_metrics = raw_data(args.input_metrics)
    save_output(data_metrics, args.output_json)

输出:

{
    "sample": {
        "id": "WGNP1000001"
    },
    "wgs_metrics": {
        "insert_size": "447.3\n",
        "insert_size_std": "98.2\n",
        "pct_mapped": "99.63\n",
        "pct_properly_paired": "97.9\n"
    }
}
json python-3.x
1个回答
0
投票

您需要将每行第二项的内容从字符串解析为浮点数。

def raw_data(input_metrics):
    d = {}
    with open(input_metrics) as f:
        for line in f.strip():
            row = line.split('\t')
            key = row[0]
            value_str = row[1]
            try:
                value = float(value_str.strip())
            except ValueError:
                value = value_str.strip()
            d[key] = value
    return d

© www.soinside.com 2019 - 2024. All rights reserved.