我正在尝试使用 pandas 使用下面来自一个更大的表的示例数据集来复制数据透视表:
year type status paid balance count
2000 bank1 active 15 21 1
2001 bank2 default 27 40 1
2002 bank3 payoff 35 150 1
2003 bank4 closed 20 80 1
. . . . . .
. . . . . .
. . . . . .
下面的数据透视表给了我这个输出:
pivot = (pd.pivot_table(df,
index=['status', 'type'],
values=['paid', 'balance', 'count'],
aggfunc="sum")
.reset_index()
.rename_axis(None, axis=1))
哪个输出:
status type paid balance count
active bank1 500 850 6
bank2 450 800 8
bank3 225 940 11
bank4 580 990 15
payoff bank1 xxx xxx xxx
bank2 xxx xxx xxx
bank3 xxx xxx xxx
bank4 xxx xxx xxx
.
.
.
closed bank1 xxx xxx xxx
bank2 xxx xxx xxx
bank3 xxx xxx xxx
bank4 xxx xxx xxx
我想要这个输出(与 Excel 数据透视表相同):
type paid balance count
active 1755 3580 40 (running totals for active)
bank1 500 850 6
bank2 450 800 8
bank3 225 940 11
bank4 580 990 15
payoff xxx xxx xxx (running totals)
bank1 xxx xxx xxx
bank2 xxx xxx xxx
bank3 xxx xxx xxx
bank4 xxx xxx xxx
.
.
.
closed xxx xxx xxx (running totals)
bank1 xxx xxx xxx
bank2 xxx xxx xxx
bank3 xxx xxx xxx
bank4 xxx xxx xxx
为此,您首先需要像以前一样创建数据透视表,然后创建一个用于计算小计的函数:
import pandas as pd
data = {
'year': [2000, 2001, 2002, 2003, 2000, 2001, 2002, 2003],
'type': ['bank1', 'bank2', 'bank3', 'bank4', 'bank1', 'bank2', 'bank3', 'bank4'],
'status': ['active', 'active', 'active', 'active', 'payoff', 'payoff', 'payoff', 'payoff'],
'paid': [15, 27, 35, 20, 10, 20, 30, 25],
'balance': [21, 40, 150, 80, 10, 40, 60, 70],
'count': [1, 1, 1, 1, 1, 1, 1, 1]
}
df = pd.DataFrame(data)
pivot = pd.pivot_table(df, index=['status', 'type'], values=['paid', 'balance', 'count'], aggfunc='sum')
def add_totals(pivot):
totals = []
for status, group in pivot.groupby(level=0):
total_row = group.sum()
total_row.index = [('total',) + (col,) for col in total_row.index]
totals.append(pd.DataFrame(total_row).T.set_index(pd.MultiIndex.from_tuples([(status, 'total')])))
return pd.concat(totals + [pivot])
pivot_with_totals = add_totals(pivot)
print(pivot_with_totals)
这给了你
(total, balance) (total, count) (total, paid) balance count \
active total 291.0 4.0 97.0 NaN NaN
payoff total 180.0 4.0 85.0 NaN NaN
active bank1 NaN NaN NaN 21.0 1.0
bank2 NaN NaN NaN 40.0 1.0
bank3 NaN NaN NaN 150.0 1.0
bank4 NaN NaN NaN 80.0 1.0
payoff bank1 NaN NaN NaN 10.0 1.0
bank2 NaN NaN NaN 40.0 1.0
bank3 NaN NaN NaN 60.0 1.0
bank4 NaN NaN NaN 70.0 1.0
paid
active total NaN
payoff total NaN
active bank1 15.0
bank2 27.0
bank3 35.0
bank4 20.0
payoff bank1 10.0
bank2 20.0
bank3 30.0
bank4 25.0