Python Pandas 中每个列组合的小计

Question

我目前有一个示例数据框，如下所示

日期	主题	动物	颜色	价值
一月	英语	猫	蓝色	1
二月	化学	狗	绿色	2

并假设上面的值是每列唯一的唯一值

我在尝试使用数据透视表创建新的多级标题数据框时遇到困难，其中 aggfunc=median 并且还包括每个列组合的“小计”，其中“小计”指的是该列的所有类别将包含在聚合中。

例如，我希望生成的数据框如下所示，其中“全部”指的是该特定列的所有类别的分组。

主题	英语	英语	英语	英语	英语	英语	英语	英语	英语	全部
动物	猫	猫	猫	狗	狗	狗	全部	全部	全部	猫
颜色	蓝色	绿色	全部	蓝色	绿色	全部	蓝色	绿色	全部	蓝色
日期
一月
二月

在此之后，应该有 27 列，因为：

独特科目的数量 + 1 小计 = 3
独特动物的数量 + 1 小计 = 3
独特颜色的数量 + 1 小计 = 3

3×3×3=27

Answer 1

IIUC，您可以将列转换为

Categorical

之前的

pivot

:

from itertools import combinations

# Transform discrete columns as categorical features
cols = ['date', 'subject', 'animal', 'colors']
cats = {col: pd.CategoricalDtype(list(df[col].unique()) + ['all'], ordered=True)
           for col in cols}
df = df.astype(cats)

# Compute intermediate subtotals
data = []
for grp in combinations(cols, r=len(cols)-1):
    df1 = df.groupby(list(grp), as_index=False, observed=True)['value'].sum()
    data.append(df1)
out = pd.concat([df, *data]).fillna('all')

# Reshape your dataframe to get all combinations
out = out.pivot_table(index='date', columns=['subject', 'animal', 'colors'], 
                      values='value', fill_value=-1, aggfunc='sum', observed=False)

奖励：现在您还拥有列小计：

>>> out
subject English                                         Chemistry                                          all                                        
animal      cat            dog            all                 cat            dog            all            cat            dog            all          
colors     blue green all blue green all blue green all      blue green all blue green all blue green all blue green all blue green all blue green all
date                                                                                                                                                  
Jan           1     0   1    0     0   0    1     0   0         0     0   0    0     0   0    0     0   0    1     0   0    0     0   0    0     0   0
Feb           0     0   0    0     0   0    0     0   0         0     0   0    0     2   2    0     2   0    0     0   0    0     2   0    0     0   0
all           1     0   0    0     0   0    0     0   0         0     0   0    0     2   0    0     0   0    0     0   0    0     0   0    0     0   0

Python Pandas 中每个列组合的小计

问题描述投票：0回答：1

1个回答

最新问题

Python Pandas 中每个列组合的小计

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1