我需要将 .json 文件转换为 .csv。该文件的结构极其复杂:
"lineItemSourcedId": 218410938,
"ext_inspera_assessmentRunTitle": "SPE1053 Inspera Digital Exam 20 May 2024",
"ext_inspera_assessmentRunExternalId": "EmOkVON97XbA8UczrBcjCAc9kt32VCafDe933mzoTMIYItqgmQdzEtO6O6q2W23k-a6cc1ca5c74b141b8e41e6f8d739ffbfc4d85f18",
"ext_inspera_maxTotalScore": 824,
"ext_inspera_candidates": [
{
"result": {
"sourcedId": 30460745,
"ext_inspera_userAssessmentSetupId": 33061896,
"ext_inspera_userAssessmentId": 25160852,
"dateLastModified": "2024-05-20T10:54:49Z",
"ext_inspera_startTime": "2024-05-20T08:32:08Z",
"ext_inspera_endTime": "2024-05-20T10:54:49Z",
"ext_inspera_extraTimeMins": 0,
"ext_inspera_incidentTimeMins": 0,
"ext_inspera_candidateId": "230402199",
"ext_inspera_attendance": true,
"lineItem": {
"sourcedId": 218410938,
"type": "lineItem"
},
"student": {
"sourcedId": 180533235,
"type": "user"
},
"ext_inspera_autoScore": 65.5,
"ext_inspera_questions": [
{
"ext_inspera_maxQuestionScore": 1,
"ext_inspera_questionId": 210834195,
"ext_inspera_questionContentItemId": 207749578,
"ext_inspera_questionNumber": "1",
"ext_inspera_questionTitle": "Select the word class for the underlined words",
"ext_inspera_questionWeight": 1,
"ext_inspera_durationSeconds": 24,
"ext_inspera_autoScore": 0,
"ext_inspera_candidateResponses": [
{
"ext_inspera_response": "rId5",
"ext_inspera_interactionAlternative": "6"
}
]
},
通过以下代码,我成功隔离了我感兴趣的数据。这存在于字典中的“ext_inspera_candidateResponses”键下;
import os
import pandas as pd
os.chdir("/path/to/directory")
with open("json_file.json") as json_file:
jd = json.load(json_file)
output = pd.json_normalize(jd,
record_path= ["ext_inspera_candidates", "result",
"ext_inspera_questions", "ext_inspera_candidateResponses"]
)
output.to_csv("json_conversion_output.csv")
这会生成以下 .csv 文件:
但是,这并没有捕获树中更高层的其他关键变量。例如,我想存储“ext_inspera_candidateId”和“ext_inspera_autoScore”导致以下
我尝试通过添加
meta= "ext_inspera_candidateId"
来做到这一点,但尽管这会创建一个列,但变量为空。
有没有办法(i)提取解析树中较低的键值对,同时(ii)保留解析树中较高的键值对?
谢谢
meta
是正确的选项,但它应该是这些字段的路径列表。
output = pd.json_normalize(
jd,
record_path = ["ext_inspera_candidates", "result", "ext_inspera_questions", "ext_inspera_candidateResponses"],
meta = [["ext_inspera_candidates", "result", "ext_inspera_candidateId"],
["ext_inspera_candidates", "result", "ext_inspera_autoScore"]]
)