添加列参数时,Pandas 数据透视表按字母顺序对分类数据进行排序(错误)

问题描述 投票:0回答:1

我在使用 Pandas 数据透视函数时遇到了麻烦。我正在尝试按月和按年透视销售数据。数据集如下:

Customer - Sales - Month Name   - Year
a        - 100   - january      - 2013
a        - 120   - january      - 2014
b        - 220   - january      - 2013

为了正确排序月份名称,我添加了一个列,其中月份名称作为分类数据。

dataset['Month'] = dataset['Month Name'].astype('category')
dataset['Month'].cat.set_categories(['January', 'February', 'March', 'April', 'May', 'June',      'July', 'August', 'September', 'October', 'November', 'December'],inplace=True)
dataset.pop('Month Name')

当我使用该功能时:

pt = dataset.pivot_table(values="Sales", index="Month")

我得到了预期的结果:

Month
January      3620302.79
February     3775507.25
March        4543839.69

但是,当我跨年和月进行旋转时,月份会按字母顺序排序:

print dataset.pivot_table(values='Sales', index="Month", columns="Year", aggfunc="sum")
Year            2011        2012        2013        2014
Month                                                   
April      833692.19   954483.28  1210847.85  1210926.61
August     722604.75   735078.52   879905.23  1207211.00
December   779873.51  1053441.71  1243745.73         NaN

如果您能帮助我正确排序最后一个代码示例中的月份 Mames,我将不胜感激。

python pandas pivot-table
1个回答
0
投票

您就在

pivot_table
之后,它将重新索引“月份”,从而按字母顺序排序。幸运的是,您始终可以将
dataset['Month']
转换为
pandas.datetime
,并在
pivot_table
重新索引后将其转换回字符串。

不是最好的解决方案,但这应该可以解决问题(我使用一些随机的假人):

import pandas as pd
...
# convert dataset['Month'] to pandas.datetime by the time of pivot
# it will reindex by datetime hence the sort order is kept
pivoted = dataset.pivot_table(index=pd.to_datetime(dataset['Month']), columns='Year', \
                              values='Sales', aggfunc='sum')
pivoted
Year        2012  2013  2014
Month                       
2014-01-04   151   295   NaN
2014-02-04   279   128   NaN
2014-03-04   218   244   NaN
2014-04-04   274   152   NaN
2014-05-04   276   NaN   138
2014-06-04   223   NaN   209
...

# then re-set the index back to Month string, "%B" means month string "January" etc.
pivoted.index = [pd.datetime.strftime(m, format='%B') for m in pivoted.index]

pivoted
Year       2012  2013  2014
January     151   295   NaN
February    279   128   NaN
March       218   244   NaN
April       274   152   NaN
May         276   NaN   138
June        223   NaN   209
...

但是您会错过“月”索引标签,如果需要,可以将数据集['月']复制到另一列(称为

M
)并转换为
datetime
,然后在
pivot_table上设置多个索引
喜欢:

dataset.pivot_table(index=['M', 'Month'], ...)
© www.soinside.com 2019 - 2024. All rights reserved.