python DataFrame groupby 将 NaN 转换为 None 以生成有效的 json

问题描述 投票:0回答:1

我得到了几个必须转换和连接的 json,我用 pandas 做了什么,然后我还必须生成一个 json。 最终json的结构是固定的。 有时 json 中的某些字段丢失(这是正确的),但我必须将这些字段保留在连接的对象中,这也可以正常工作。 我必须将所有 NaN 值转换为 None 才能获得包含 null 值的有效 json,但在 groupby 操作之后,它将一些 None 值转换回 NaN。 请参阅所附示例:

import pandas as pd
import json
dict1 =  {
     "items": [
    {
      "name": "Project1",
      "projectId": "1",
    },
    {
      "name": "Project2",
      "projectId": "2",
    },
    {
      "name": "Project3",    
      "projectId": "3",
    }
    ]
}

dict2 = {
     "items": [
    {
      "attr1": "ABC",
      "attr2": "DEF1",
      "attr3": "GHI1",
      "projectId": "1",
      "services":[
          {
              "sname": "Service1",
          },
          {
              "sname": "Service2",
          }
      ]
    },
    {
      "attr1": "ABC",
      "attr2": "DEF2",
      "attr3": "GHI2",
      "projectId": "2",
      "services":[
          {
              "sname": "Service1",
          },
          {
              "sname": "Service2",
          }
      ]
    }
  ]
}

dict_head = {
    "id":"some-guid",
    "name":"some name",
    "content" :[
    ]
}

df1 = pd.DataFrame(dict1["items"])
# df2 = pd.DataFrame(dict2["items"])
df2 = pd.json_normalize(
        data = dict2['items'],
        record_path = ['services'], 
        meta = [
            'projectId', 
            'attr1',
            'attr2',
            'attr3'
        ]
    )


df_joined = df1.set_index("projectId").join(df2.set_index("projectId"))
print("df_joined_1")
print(df_joined)

#convert all NaN vales to None whichs works well
df_joined= df_joined.where(pd.notnull(df_joined), None)
print("df_joined_2")
print(df_joined)

df_grouped = df_joined.groupby(['projectId','name','attr1','attr2','attr3'], dropna=False)['sname'].apply(list).reset_index().to_dict(orient='records')
#suddenly the None values of the grouped fields are conveted back to NaN???
print("df_grouped:")
print(df_grouped)

dict_head["content"] = df_grouped
print("dict_head:")
print(dict_head)


print("dict_head as json:")
print(json.dumps(dict_head, indent=3))

项目 3 的输出,您看到 NaN 都是 null,我希望所有 NaN 都是 null 值

{
         "projectId": "3",
         "name": "Project3",
         "attr1": NaN,
         "attr2": NaN,
         "attr3": NaN,
         "sname": [
            null
         ]
      }
python json pandas group-by
1个回答
0
投票

您不需要将

NaN
转换为
None
或将数据帧显式转换为
dict
。如果您使用
pandas.to_json
,Pandas 会为您做到这一点。

df_joined = df1.set_index("projectId").join(df2.set_index("projectId"))

# remove the `.where` call

df_grouped = (
    df_joined.groupby(["projectId", "name", "attr1", "attr2", "attr3"], dropna=False)[
        "sname"
    ]
    .apply(list)
    .reset_index()
    # remove the `.to_dict` call
)

df_grouped.to_json("new_file.json", orient="records", indent=3)
[
   {
      "projectId":"1",
      "name":"Project1",
      "attr1":"ABC",
      "attr2":"DEF1",
      "attr3":"GHI1",
      "sname":[
         "Service1",
         "Service2"
      ]
   },
   {
      "projectId":"2",
      "name":"Project2",
      "attr1":"ABC",
      "attr2":"DEF2",
      "attr3":"GHI2",
      "sname":[
         "Service1",
         "Service2"
      ]
   },
   {
      "projectId":"3",
      "name":"Project3",
      "attr1":null,
      "attr2":null,
      "attr3":null,
      "sname":[
         null
      ]
   }
]
© www.soinside.com 2019 - 2024. All rights reserved.