将 json_normalize 与 pandas 一起使用

Question

我在 python 中使用以下代码来展平下面的 json 结构，但它并不适用于所有级别。我对下图中具体显示的tags.tags列数据感兴趣

pd.json_normalize(data['conversations'])
pd.set_option('display.max_columns', None)

Json结构：

[{'type': 'conversation.list',
  'pages': {'type': 'pages',
   'next': {'page': 3,
    'starting_after': 'WzE3MTU3ODIxOTc=='},
   'page': 2,
   'per_page': 1,
   'total_pages': 46969},
  'total_count': 46969,
  'conversations': [{'type': 'conversation',
    'id': '1384780',
    'created_at': 1715780970,
    'updated_at': 1715782197,
    'waiting_since': None,
    'snoozed_until': None,
    'source': {'type': 'conversation',
     'id': '2197597651',
     'delivered_as': 'customer_initiated',
     'subject': '',
     'body': '<p>Outros</p>',
     'author': {'type': 'user',
      'id': '64ac5cacccd2b047',
      'name': 'Claudiney Pinho',
      'email': '[email protected]'},
     'attachments': [],
     'url': None,
     'redacted': False},
    'contacts': {'type': 'contact.list',
     'contacts': [{'type': 'contact',
       'id': '64accccd71982047',
       'external_id': 'b363-cc8f--fb270b5e72e8'}]},
    'first_contact_reply': {'created_at': 1715780970,
     'type': 'conversation',
     'url': None},
    'admin_assignee_id': 5614527,
    'team_assignee_id': 5045796,
    'open': False,
    'state': 'closed',
    'read': True,
    'tags': {'type': 'tag.list',
     'tags': [{'type': 'tag',
       'id': '5379642',
       'name': '[BOT] Other',
       'applied_at': 1715781024,
       'applied_by': {'type': 'admin', 'id': '4685750'}},
      {'type': 'tag',
       'id': '5379660',
       'name': '[BOT] Connected Agent',
       'applied_at': 1715781025,
       'applied_by': {'type': 'admin', 'id': '4685750'}},
      {'type': 'tag',
       'id': '5379654',
       'name': '[BOT] Not Resolved',
       'applied_at': 1715781027,
       'applied_by': {'type': 'admin', 'id': '4685750'}},
      {'type': 'tag',
       'id': '7046337',
       'name': '[BOT] Portuguese',
       'applied_at': 1715781029,
       'applied_by': {'type': 'admin', 'id': '4685750'}}]},
    'priority': 'not_priority',
    'sla_applied': None,
    'statistics': {'type': 'conversation_statistics',
     'time_to_assignment': 0,
     'time_to_admin_reply': 189,
     'time_to_first_close': 1158,
     'time_to_last_close': 1228,
     'median_time_to_reply': 139,
     'first_contact_reply_at': 1715780970,
     'first_assignment_at': 1715780970,
     'first_admin_reply_at': 1715781159,
     'first_close_at': 1715782128,
     'last_assignment_at': 1715781159,
     'last_assignment_admin_reply_at': 1715781159,
     'last_contact_reply_at': 1715782179,
     'last_admin_reply_at': 1715782125,
     'last_close_at': 1715782198,
     'last_closed_by_id': 5614527,
     'count_reopens': 1,
     'count_assignments': 3,
     'count_conversation_parts': 28},
    'conversation_rating': None,
    'teammates': {'type': 'admin.list',
     'admins': [{'type': 'admin', 'id': '5614527'}]},
    'title': None,
    'custom_attributes': {'Language': 'Portuguese',
     'Conversation status': 'Open',
     'From': 'iOS / Android'},
    'topics': {'type': 'topic.list', 'topics': [], 'total_count': 0},
    'ticket': None,
    'linked_objects': {'type': 'list',
     'data': [],
     'total_count': 0,
     'has_more': False}}]}]

数据集如何显示

我希望创建一个带有扁平键的 pandas 数据框：表中的值

Answer 1

您可以尝试使用

pd.json_normalize

函数以及 record_path 和 meta 参数。这应该将 JSON 扁平化为您选择的不同多个级别，然后您可以进一步将每个嵌套列表或字典提取到您感兴趣的列中。

我为您提供了一个代码片段，我没有使用

jupyter

，因此您可能需要测试它，但理论上这应该很好用，因为这个概念是正确的。

import pandas as pd

df = pd.json_normalize(
    data[0]["conversations"],
    record_path=["tags", "tags"],
    meta=[
        "id",
        "created_at",
        "updated_at",
        ["source", "id"],
        ["source", "author", "name"],
    ],
    meta_prefix="meta_",  # trying to avoid conflicts with available ids 
    errors="ignore",
)
pd.set_option("display.max_columns", None)

将 json_normalize 与 pandas 一起使用

问题描述投票：0回答：1

1个回答

最新问题

将 json_normalize 与 pandas 一起使用

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1