python DataFrame groupby 将 NaN 转换为 None 以生成有效的 json

Question

我得到了几个必须转换和连接的 json，我用 pandas 做了什么，然后我还必须生成一个 json。最终json的结构是固定的。有时 json 中的某些字段丢失（这是正确的），但我必须将这些字段保留在连接的对象中，这也可以正常工作。我必须将所有 NaN 值转换为 None 才能获得包含 null 值的有效 json，但在 groupby 操作之后，它将一些 None 值转换回 NaN。请参阅所附示例：

import pandas as pd
import json
dict1 =  {
     "items": [
    {
      "name": "Project1",
      "projectId": "1",
    },
    {
      "name": "Project2",
      "projectId": "2",
    },
    {
      "name": "Project3",    
      "projectId": "3",
    }
    ]
}

dict2 = {
     "items": [
    {
      "attr1": "ABC",
      "attr2": "DEF1",
      "attr3": "GHI1",
      "projectId": "1",
      "services":[
          {
              "sname": "Service1",
          },
          {
              "sname": "Service2",
          }
      ]
    },
    {
      "attr1": "ABC",
      "attr2": "DEF2",
      "attr3": "GHI2",
      "projectId": "2",
      "services":[
          {
              "sname": "Service1",
          },
          {
              "sname": "Service2",
          }
      ]
    }
  ]
}

dict_head = {
    "id":"some-guid",
    "name":"some name",
    "content" :[
    ]
}

df1 = pd.DataFrame(dict1["items"])
# df2 = pd.DataFrame(dict2["items"])
df2 = pd.json_normalize(
        data = dict2['items'],
        record_path = ['services'], 
        meta = [
            'projectId', 
            'attr1',
            'attr2',
            'attr3'
        ]
    )


df_joined = df1.set_index("projectId").join(df2.set_index("projectId"))
print("df_joined_1")
print(df_joined)

#convert all NaN vales to None whichs works well
df_joined= df_joined.where(pd.notnull(df_joined), None)
print("df_joined_2")
print(df_joined)

df_grouped = df_joined.groupby(['projectId','name','attr1','attr2','attr3'], dropna=False)['sname'].apply(list).reset_index().to_dict(orient='records')
#suddenly the None values of the grouped fields are conveted back to NaN???
print("df_grouped:")
print(df_grouped)

dict_head["content"] = df_grouped
print("dict_head:")
print(dict_head)


print("dict_head as json:")
print(json.dumps(dict_head, indent=3))

项目 3 的输出，您看到 NaN 都是 null，我希望所有 NaN 都是 null 值

{
         "projectId": "3",
         "name": "Project3",
         "attr1": NaN,
         "attr2": NaN,
         "attr3": NaN,
         "sname": [
            null
         ]
      }

Answer 1

您不需要将

NaN

转换为

None

或将数据帧显式转换为

dict

。如果您使用

pandas.to_json

，Pandas 会为您做到这一点。

df_joined = df1.set_index("projectId").join(df2.set_index("projectId"))

# remove the `.where` call

df_grouped = (
    df_joined.groupby(["projectId", "name", "attr1", "attr2", "attr3"], dropna=False)[
        "sname"
    ]
    .apply(list)
    .reset_index()
    # remove the `.to_dict` call
)

df_grouped.to_json("new_file.json", orient="records", indent=3)

[
   {
      "projectId":"1",
      "name":"Project1",
      "attr1":"ABC",
      "attr2":"DEF1",
      "attr3":"GHI1",
      "sname":[
         "Service1",
         "Service2"
      ]
   },
   {
      "projectId":"2",
      "name":"Project2",
      "attr1":"ABC",
      "attr2":"DEF2",
      "attr3":"GHI2",
      "sname":[
         "Service1",
         "Service2"
      ]
   },
   {
      "projectId":"3",
      "name":"Project3",
      "attr1":null,
      "attr2":null,
      "attr3":null,
      "sname":[
         null
      ]
   }
]

python DataFrame groupby 将 NaN 转换为 None 以生成有效的 json

问题描述投票：0回答：1

1个回答

最新问题

python DataFrame groupby 将 NaN 转换为 None 以生成有效的 json

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1