如何使用“分组”列将数据框从长变为宽？

Question

当将以下数据框从长到宽旋转时，我想获取列的“组”并用前缀或后缀标记它们。

元素组可以具有不同的大小，即由一个、两个或多个分组元素/行组成，我在这里使用了两个元素/行以保持示例简单。

import pandas as pd

df = pd.DataFrame(
    [
        {'group': 'group-009297534',  'single_id': 'single-011900051',  'country': 'ESP',  'name': '00000911'},
        {'group': 'group-009297534',  'single_id': 'single-000000821',  'country': 'USA',  'name': '00001054'},
        {'group': 'group-009280053',  'single_id': 'single-000000002',  'country': 'HUN',  'name': '00000496'},
        {'group': 'group-009280053',  'single_id': 'single-000000014',  'country': 'HUN',  'name': '00000795'},
        {'group': 'group-009245039',  'single_id': 'single-000001258',  'country': 'NOR',  'name': '00000527'},
        {'group': 'group-009245039',  'single_id': 'single-000000669',  'country': 'TWN',  'name': '00000535'}
    ]
)

我为要分组的元素分配索引，然后使用它来指定列的方法已经朝着正确的方向发展，但仍然偏离预期的视图

df['idx'] = df.groupby('group').cumcount()
df.pivot(index='group', columns='idx')

组	('single_id', 0)	('single_id', 1)	('国家', 0)	('国家', 1)	('姓名', 0)	('姓名', 1)
群-009245039	单-000001258	单-000000669	也	台湾	00000527	00000535
组-009280053	单-000000002	单-000000014	洪	洪	00000496	00000795
群-009297534	单-011900051	单-000000821	ESP	美国	00000911	00001054

但是，预期的解决方案将如下所示：

	组	single_id_1	国家_1	名称_1	single_id_2	国家_2	名称_2
0	群-009245039	单-000001258	也	00000527	单-000000669	台湾	00000535
1	组-009280053	单-000000002	洪	00000496	单-000000014	洪	00000795
2	群-009297534	单-011900051	ESP	00000911	单-000000821	美国	00001054

我不确定使用多索引的方法（然后必须以某种方式进行排序和合并）是否是正确的方法，或者是否有更优雅的选择。

Answer 1

这是您正在寻找的吗？将 pandas 导入为 pd

df = pd.DataFrame([
    {'group': 'group-009297534',  'single_id': 'single-011900051',  'country': 'ESP',  'name': '00000911'},
    {'group': 'group-009297534',  'single_id': 'single-000000821',  'country': 'USA',  'name': '00001054'},
    {'group': 'group-009280053',  'single_id': 'single-000000002',  'country': 'HUN',  'name': '00000496'},
    {'group': 'group-009280053',  'single_id': 'single-000000014',  'country': 'HUN',  'name': '00000795'},
    {'group': 'group-009245039',  'single_id': 'single-000001258',  'country': 'NOR',  'name': '00000527'},
    {'group': 'group-009245039',  'single_id': 'single-000000669',  'country': 'TWN',  'name': '00000535'}
])

df['idx'] = (df.groupby('group').cumcount() + 1).astype(str)


# Pivot the DataFrame with the new 'idx' for differentiation
df_pivoted = df.pivot(index='group', columns='idx')

# Flatten the MultiIndex and format column names
df_pivoted.columns = [f'{x[0]}_{x[1]}' for x in df_pivoted.columns]

# Reset the index to bring 'group' back as a column
df_pivoted.reset_index(inplace=True)

# Optional: Reorder the columns according to your expected output
# This assumes you know the order and number of groups
expected_order = [
    'group', 
    'single_id_1', 'country_1', 'name_1',
    'single_id_2', 'country_2', 'name_2'
]
df_pivoted = df_pivoted[expected_order]

print(df_pivoted)

输出：

  group       single_id_1 country_1    name_1       single_id_2  \
0  group-009245039  single-000001258       NOR  00000527  single-000000669   
1  group-009280053  single-000000002       HUN  00000496  single-000000014   
2  group-009297534  single-011900051       ESP  00000911  single-000000821   

  country_2    name_2  
0       TWN  00000535  
1       HUN  00000795  
2       USA  00001054

如何使用“分组”列将数据框从长变为宽？

问题描述投票：0回答：1

1个回答

最新问题

如何使用“分组”列将数据框从长变为宽？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1