取消嵌套 pandas json 列并保留“id”列

问题描述 投票:0回答:1

我正在处理一些嵌套的 NoSQL 数据。我想使用

json_normalize
取消嵌套它,但保留“id transação”列,以便我可以将生成的数据帧合并到其他数据帧中。

import pandas as pd
import json

data = {
            "id de transação": [1, 2, 3, 4, 5],
            "nome": ["Alice", "Bob", "Charlie", "David", "Eve"],
            "dados": [
                {"data": "2024-01-01", "local": "São Paulo", "valor": 100.50},
                {"data": "2024-01-02", "local": "Rio de Janeiro", "valor": 200.75},
                {"data": "2024-01-03", "local": "Belo Horizonte", "valor": 300.00},
                {"data": "2024-01-04", "local": "Curitiba", "valor": 400.25},
                {"data": "2024-01-05", "local": "Porto Alegre", "valor": 500.50}
            ]
        }
df = pd.DataFrame(data)

我尝试使用

meta
参数但没有成功。

df_dados_normalized = pd.json_normalize(data = data["dados"], record_path=None, meta=data["id de transação"])

enter image description here

有办法吗?我希望得到的数据框带有“id de transação”。

pandas json-normalize
1个回答
0
投票

meta
不接受 Series 输入,如果传递带有 record_path 的嵌套对象,它会接受构成要保留的数据的键列表。

相反,您应该将

join
的输出
json_normalize
保留到列中。确保
set_axis
df
相同,因为
json_normalize
将创建一个新的:

out = (df[['id de transação', 'nome']]
       .join(pd.json_normalize(data=df["dados"], record_path=None)
               .set_axis(df.index)
            )
      )

输出:

   id de transação     nome        data           local   valor
0                1    Alice  2024-01-01       São Paulo  100.50
1                2      Bob  2024-01-02  Rio de Janeiro  200.75
2                3  Charlie  2024-01-03  Belo Horizonte  300.00
3                4    David  2024-01-04        Curitiba  400.25
4                5      Eve  2024-01-05    Porto Alegre  500.50
© www.soinside.com 2019 - 2024. All rights reserved.