拆分数据框并水平堆叠

问题描述 投票:0回答:2

我有以下数据框:

import pandas as pd
data = {
   "cat": ['A'] * 3 + ['B'] * 2,   
   "val1": [10, 9, 12, 20, 25],
   "val2": [1, 3, 2, 6, 7],
   "idx": [0, 1, 5, 1, 2]
}

df = pd.DataFrame(data)
df.set_index('idx', inplace=True)

这给出了

      cat     val1   val2
idx         
0       A       10      1
1       A        9      3
5       A       12      2
1       B       20      6
2       B       25      7

我想将其转换为具有 MultiIndex 列的 DataFrame

        A              B
     val1   val2    val1    val2
idx         
0      10      1      NA      NA
1       9      3      20       6
2      NA     NA      25       7
5      12      2      NA      NA

有没有办法在不手动拆分和水平连接表格的情况下做到这一点? 谢谢帮助!

pandas dataframe pivot stack transform
2个回答
0
投票
import pandas as pd

data = {
   "cat": ['A'] * 3 + ['B'] * 2,   
   "val1": [10, 9, 12, 20, 25],
   "val2": [1, 3, 2, 6, 7],
   "idx": [0, 1, 5, 1, 2]
}

df = pd.DataFrame(data)
df.set_index('idx', inplace=True)

pivot_df = df.pivot_table(index=df.index, columns='cat', aggfunc='first')

pivot_df.columns = [' '.join(col).strip() for col in pivot_df.columns.values]

print(pivot_df)

输出会是这样的

     val1 A  val1 B  val2 A  val2 B
idx
0      10.0     NaN     1.0     NaN
1       9.0    20.0     3.0     6.0
2       NaN    25.0     NaN     7.0
5      12.0     NaN     2.0     NaN

0
投票

您可以尝试使用

df.pivot
swaplevel
sort_index
。添加
df.rename_axis
以删除添加的列名称(即“cat”)。

out = (
    df.pivot(columns='cat')
    .swaplevel(0, 1, axis=1)
    .sort_index(axis=1, level=0)
    .rename_axis(columns=(None, None))
)

out

        A          B     
     val1 val2  val1 val2
idx                      
0    10.0  1.0   NaN  NaN
1     9.0  3.0  20.0  6.0
2     NaN  NaN  25.0  7.0
5    12.0  2.0   NaN  NaN
© www.soinside.com 2019 - 2024. All rights reserved.