我希望将两个列制表符分隔的文件转换为“键:值”并转储为 json 格式,以及一组键和参数
sample_id = 'WGNP1000001'
。这是我的输入和预期输出格式以及代码。感谢任何帮助。谢谢
例如。 制表符分隔的输入文件:WGNP1000001.list.txt
insert_size 447.3
insert_size_std 98.2
pct_properly_paired 97.9
pct_mapped 99.63
预期输出json格式:
{
"sample": {
"id": "WGNP1000001"
},
"wgs_metrics": {
"insert_size_std": 98.2,
"insert_size": 447.3,
"pct_mapped": 99.63,
"pct_properly_paired": 97.9
}
}
sample_id = 'WGNP1000001'`
count_aln.py --input_metrics "${sample_id}.list.txt" --sample_id ${sample_id} --output_json ${sample_id}.metrics.json
代码:
#!/usr/bin/env python3
import argparse
import json
import subprocess
import numpy as np
import sys
import os
from pathlib import Path
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument("--sample_id", dest="sample_id", required=True,
default=None,
help="Sample ID")
parser.add_argument("--input_metrics", dest="input_metrics", required=True,
default=None,
help="Path to input aln metrics list")
parser.add_argument("--output_json", dest="output_json", required=False,
default="./variant_counts.json",
help="Path to output file for variant metrics. Default: ./variant_counts.json")
parser.add_argument("--scratch_dir", dest="scratch_dir", required=False,
default="./",
help="Path to scratch dir. Default: ./")
args = parser.parse_args()
# create scratch dir if it doesn't exist
Path(args.scratch_dir).mkdir(parents=True, exist_ok=True)
return args
def raw_data(input_metrics):
d = {}
# d = dict()
with open(input_metrics) as f:
rows = ( line.split('\t') for line in f )
d = { row[0]:row[1] for row in rows }
return d
def save_output(data_metrics, outfile):
with open(outfile, "w") as f:
data_metrics = {"sample" : {"id" : args.sample_id}, "wgs_metrics" : data_metrics}
json.dump(data_metrics, f, sort_keys=True, indent=4)
f.write("\n")
if __name__ == "__main__":
args = parse_args()
data_metrics = raw_data(args.input_metrics)
save_output(data_metrics, args.output_json)
输出:
{
"sample": {
"id": "WGNP1000001"
},
"wgs_metrics": {
"insert_size": "447.3\n",
"insert_size_std": "98.2\n",
"pct_mapped": "99.63\n",
"pct_properly_paired": "97.9\n"
}
}
您需要将每行第二项的内容从字符串解析为浮点数。
def raw_data(input_metrics):
d = {}
with open(input_metrics) as f:
for line in f.strip():
row = line.split('\t')
key = row[0]
value_str = row[1]
try:
value = float(value_str.strip())
except ValueError:
value = value_str.strip()
d[key] = value
return d