我有以下格式的 .json 文件:
{
"uls":{
"equ1-L1-u": {"D": 1.10, "La": 1.50, "Lb": 1.50},
"equ1-L2-u": {"D": 1.10, "La": 1.50, "Lb": 1.50},
},
"sls":{
"cha-L1": {"Ld": 1.00, "Le": 1.00, "Lf": 1.00, "Lg": 1.00, "Lh": 1.00},
"cha-L2": {"D": 1.00, "Df": 1.00},
}
}
我想将其转换为 CSV“数据库”样式格式:
Criteria, Name, D, Df, La, Lb, Ld, Le, Lf, Lg, Lh
uls, equ1-L1-u, 1.10, , 1.50, 1.50, , , , ,
uls, equ1-L2-u, 1.10, , 1.50, 1.50, , , , ,
sls, cha-L1, , , , ,1.00 , 1.00, 1.00, 1.00, 1.00
sls, cha-L2, 1.00, 1.00, , , , , , ,
理想情况下,我不必事先定义值键,但现在如果需要的话我可以这样做。
这就是我现在得到的,它适用于 2 层嵌套的特定情况。我可以通过修改代码使其适用于 3 或 4 级嵌套,但理想情况下相同的代码可以用于所有级别的嵌套(也许是递归?)。
# Function to convert json objects to csv
import json
import csv
def make_csv_dict(data, key_headers):
csv_dict = []
for i in data:
for j in data[i]:
csv_dict.append({
key_headers[0]: i,
key_headers[1]: j,
**data[i][j]
})
return csv_dict
### ENTER DATA HERE ###
key_headers = ["Criteria", "Name"]
path = "File.json"
### ENTER DATA HERE ###
# Read json
with open(path) as json_file:
data = json.load(json_file)
# make csv_dict from .json data
csv_dict = make_csv_dict(data, key_headers)
# writing to csv file
fieldnames = ["Criteria", "Name", "D", "Df", "La", "Lb", "Lc", "Ld", "Le", "Lf", "Lg", "Lh", "Sl", "Sh", "W", "T", "A", "E"]
with open(path.replace(".json",".csv"), 'w', newline="") as f:
writer = csv.DictWriter(f, fieldnames)
writer.writeheader()
writer.writerows(csv_dict)
您可以使用
pd.json_normalize
和 pivot_table()
来实现此目的,并进行一些额外的处理。
首次使用
json_normalize
:
import pandas as pd
data = {
"uls":{
"equ1-L1-u": {"D": 1.10, "La": 1.50, "Lb": 1.50},
"equ1-L2-u": {"D": 1.10, "La": 1.50, "Lb": 1.50},
},
"sls":{
"cha-L1": {"Ld": 1.00, "Le": 1.00, "Lf": 1.00, "Lg": 1.00, "Lh": 1.00},
"cha-L2": {"D": 1.00, "Df": 1.00},
}
}
df = pd.json_normalize(data, sep='_')
print(df)
uls_equ1-L1-u_D uls_equ1-L1-u_La ... sls_cha-L2_D sls_cha-L2_Df
0 1.1 1.5 ... 1.0 1.0
接下来,您要转置数据框:
df = df.T.reset_index().rename(columns = {0: 'Values'})
print(df)
index Values
0 uls_equ1-L1-u_D 1.1
1 uls_equ1-L1-u_La 1.5
2 uls_equ1-L1-u_Lb 1.5
3 uls_equ1-L2-u_D 1.1
4 uls_equ1-L2-u_La 1.5
5 uls_equ1-L2-u_Lb 1.5
6 sls_cha-L1_Ld 1.0
7 sls_cha-L1_Le 1.0
8 sls_cha-L1_Lf 1.0
9 sls_cha-L1_Lg 1.0
10 sls_cha-L1_Lh 1.0
11 sls_cha-L2_D 1.0
12 sls_cha-L2_Df 1.0
现在我们可以使用
json_normalize
函数中定义的分隔符将索引列拆分为多个列:
df[['Criteria', 'Name', 'SubName']] = df['index'].str.split('_', expand=True)
print(df)
index Values Criteria Name SubName
0 uls_equ1-L1-u_D 1.1 uls equ1-L1-u D
1 uls_equ1-L1-u_La 1.5 uls equ1-L1-u La
2 uls_equ1-L1-u_Lb 1.5 uls equ1-L1-u Lb
3 uls_equ1-L2-u_D 1.1 uls equ1-L2-u D
4 uls_equ1-L2-u_La 1.5 uls equ1-L2-u La
5 uls_equ1-L2-u_Lb 1.5 uls equ1-L2-u Lb
6 sls_cha-L1_Ld 1.0 sls cha-L1 Ld
7 sls_cha-L1_Le 1.0 sls cha-L1 Le
8 sls_cha-L1_Lf 1.0 sls cha-L1 Lf
9 sls_cha-L1_Lg 1.0 sls cha-L1 Lg
10 sls_cha-L1_Lh 1.0 sls cha-L1 Lh
11 sls_cha-L2_D 1.0 sls cha-L2 D
12 sls_cha-L2_Df 1.0 sls cha-L2 Df
最后,我们需要旋转数据框,重命名索引,并填充 NaN 值:
pivot_df = df.pivot_table(index=['Criteria', 'Name'], columns='SubName', values='Values', aggfunc='first').reset_index().rename_axis(None, axis=1).fillna('')
print(pivot_df)
Criteria Name D Df La Lb Ld Le Lf Lg Lh
0 sls cha-L1 1.0 1.0 1.0 1.0 1.0
1 sls cha-L2 1.0 1.0
2 uls equ1-L1-u 1.1 1.5 1.5
3 uls equ1-L2-u 1.1 1.5 1.5
总而言之就是:
import pandas as pd
data = {
"uls":{
"equ1-L1-u": {"D": 1.10, "La": 1.50, "Lb": 1.50},
"equ1-L2-u": {"D": 1.10, "La": 1.50, "Lb": 1.50},
},
"sls":{
"cha-L1": {"Ld": 1.00, "Le": 1.00, "Lf": 1.00, "Lg": 1.00, "Lh": 1.00},
"cha-L2": {"D": 1.00, "Df": 1.00},
}
}
df = pd.json_normalize(data, sep='_')
df = df.T.reset_index().rename(columns = {0: 'Values'})
df[['Criteria', 'Name', 'SubName']] = df['index'].str.split('_', expand=True)
pivot_df = df.pivot_table(index=['Criteria', 'Name'], columns='SubName', values='Values', aggfunc='first').reset_index().rename_axis(None, axis=1).fillna('')