我有以下数据框:
import pandas as pd
data = {
"cat": ['A'] * 3 + ['B'] * 2,
"val1": [10, 9, 12, 20, 25],
"val2": [1, 3, 2, 6, 7],
"idx": [0, 1, 5, 1, 2]
}
df = pd.DataFrame(data)
df.set_index('idx', inplace=True)
这给出了
cat val1 val2
idx
0 A 10 1
1 A 9 3
5 A 12 2
1 B 20 6
2 B 25 7
我想将其转换为具有 MultiIndex 列的 DataFrame
A B
val1 val2 val1 val2
idx
0 10 1 NA NA
1 9 3 20 6
2 NA NA 25 7
5 12 2 NA NA
有没有办法在不手动拆分和水平连接表格的情况下做到这一点? 谢谢帮助!
import pandas as pd
data = {
"cat": ['A'] * 3 + ['B'] * 2,
"val1": [10, 9, 12, 20, 25],
"val2": [1, 3, 2, 6, 7],
"idx": [0, 1, 5, 1, 2]
}
df = pd.DataFrame(data)
df.set_index('idx', inplace=True)
pivot_df = df.pivot_table(index=df.index, columns='cat', aggfunc='first')
pivot_df.columns = [' '.join(col).strip() for col in pivot_df.columns.values]
print(pivot_df)
输出会是这样的
val1 A val1 B val2 A val2 B
idx
0 10.0 NaN 1.0 NaN
1 9.0 20.0 3.0 6.0
2 NaN 25.0 NaN 7.0
5 12.0 NaN 2.0 NaN
df.pivot
、swaplevel
和 sort_index
。添加 df.rename_axis
以删除添加的列名称(即“cat”)。
out = (
df.pivot(columns='cat')
.swaplevel(0, 1, axis=1)
.sort_index(axis=1, level=0)
.rename_axis(columns=(None, None))
)
out
A B
val1 val2 val1 val2
idx
0 10.0 1.0 NaN NaN
1 9.0 3.0 20.0 6.0
2 NaN NaN 25.0 7.0
5 12.0 2.0 NaN NaN