json_normalize - 如何将变量保持在解析树的高位?

问题描述 投票:0回答:1

我需要将 .json 文件转换为 .csv。该文件的结构极其复杂:

"lineItemSourcedId": 218410938,
"ext_inspera_assessmentRunTitle": "SPE1053 Inspera Digital Exam 20 May 2024",
"ext_inspera_assessmentRunExternalId": "EmOkVON97XbA8UczrBcjCAc9kt32VCafDe933mzoTMIYItqgmQdzEtO6O6q2W23k-a6cc1ca5c74b141b8e41e6f8d739ffbfc4d85f18",
"ext_inspera_maxTotalScore": 824,
"ext_inspera_candidates": [
    {
        "result": {
            "sourcedId": 30460745,
            "ext_inspera_userAssessmentSetupId": 33061896,
            "ext_inspera_userAssessmentId": 25160852,
            "dateLastModified": "2024-05-20T10:54:49Z",
            "ext_inspera_startTime": "2024-05-20T08:32:08Z",
            "ext_inspera_endTime": "2024-05-20T10:54:49Z",
            "ext_inspera_extraTimeMins": 0,
            "ext_inspera_incidentTimeMins": 0,
            "ext_inspera_candidateId": "230402199",
            "ext_inspera_attendance": true,
            "lineItem": {
                "sourcedId": 218410938,
                "type": "lineItem"
            },
            "student": {
                "sourcedId": 180533235,
                "type": "user"
            },
            "ext_inspera_autoScore": 65.5,
            "ext_inspera_questions": [
                {
                    "ext_inspera_maxQuestionScore": 1,
                    "ext_inspera_questionId": 210834195,
                    "ext_inspera_questionContentItemId": 207749578,
                    "ext_inspera_questionNumber": "1",
                    "ext_inspera_questionTitle": "Select the word class for the underlined words",
                    "ext_inspera_questionWeight": 1,
                    "ext_inspera_durationSeconds": 24,
                    "ext_inspera_autoScore": 0,
                    "ext_inspera_candidateResponses": [
                        {
                            "ext_inspera_response": "rId5",
                            "ext_inspera_interactionAlternative": "6"
                        }
                    ]
                },

通过以下代码,我成功隔离了我感兴趣的数据。这存在于字典中的“ext_inspera_candidateResponses”键下;

import os
import pandas as pd


os.chdir("/path/to/directory")

with open("json_file.json") as json_file:
    jd = json.load(json_file)

output = pd.json_normalize(jd,
                       record_path= ["ext_inspera_candidates", "result", 
"ext_inspera_questions", "ext_inspera_candidateResponses"]
                          )

output.to_csv("json_conversion_output.csv")

这会生成以下 .csv 文件:

CSV file resulting from conversion

但是,这并没有捕获树中更高层的其他关键变量。例如,我想存储“ext_inspera_candidateId”和“ext_inspera_autoScore”导致以下

Second CSV file

我尝试通过添加

meta= "ext_inspera_candidateId"
来做到这一点,但尽管这会创建一个列,但变量为空。

有没有办法(i)提取解析树中较低的键值对,同时(ii)保留解析树中较高的键值对?

谢谢

python json
1个回答
0
投票

meta
是正确的选项,但它应该是这些字段的路径列表。

output = pd.json_normalize(
    jd,
    record_path = ["ext_inspera_candidates", "result", "ext_inspera_questions", "ext_inspera_candidateResponses"],
    meta = [["ext_inspera_candidates", "result", "ext_inspera_candidateId"],
            ["ext_inspera_candidates", "result", "ext_inspera_autoScore"]]
)
© www.soinside.com 2019 - 2024. All rights reserved.