我在使用 Pandas 数据透视函数时遇到了麻烦。我正在尝试按月和按年透视销售数据。数据集如下:
Customer - Sales - Month Name - Year
a - 100 - january - 2013
a - 120 - january - 2014
b - 220 - january - 2013
为了正确排序月份名称,我添加了一个列,其中月份名称作为分类数据。
dataset['Month'] = dataset['Month Name'].astype('category')
dataset['Month'].cat.set_categories(['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'],inplace=True)
dataset.pop('Month Name')
当我使用该功能时:
pt = dataset.pivot_table(values="Sales", index="Month")
我得到了预期的结果:
Month
January 3620302.79
February 3775507.25
March 4543839.69
但是,当我跨年和月进行旋转时,月份会按字母顺序排序:
print dataset.pivot_table(values='Sales', index="Month", columns="Year", aggfunc="sum")
Year 2011 2012 2013 2014
Month
April 833692.19 954483.28 1210847.85 1210926.61
August 722604.75 735078.52 879905.23 1207211.00
December 779873.51 1053441.71 1243745.73 NaN
如果您能帮助我正确排序最后一个代码示例中的月份 Mames,我将不胜感激。
您就在
pivot_table
之后,它将重新索引“月份”,从而按字母顺序排序。幸运的是,您始终可以将 dataset['Month']
转换为 pandas.datetime
,并在 pivot_table
重新索引后将其转换回字符串。
不是最好的解决方案,但这应该可以解决问题(我使用一些随机的假人):
import pandas as pd
...
# convert dataset['Month'] to pandas.datetime by the time of pivot
# it will reindex by datetime hence the sort order is kept
pivoted = dataset.pivot_table(index=pd.to_datetime(dataset['Month']), columns='Year', \
values='Sales', aggfunc='sum')
pivoted
Year 2012 2013 2014
Month
2014-01-04 151 295 NaN
2014-02-04 279 128 NaN
2014-03-04 218 244 NaN
2014-04-04 274 152 NaN
2014-05-04 276 NaN 138
2014-06-04 223 NaN 209
...
# then re-set the index back to Month string, "%B" means month string "January" etc.
pivoted.index = [pd.datetime.strftime(m, format='%B') for m in pivoted.index]
pivoted
Year 2012 2013 2014
January 151 295 NaN
February 279 128 NaN
March 218 244 NaN
April 274 152 NaN
May 276 NaN 138
June 223 NaN 209
...
但是您会错过“月”索引标签,如果需要,可以将数据集['月']复制到另一列(称为
M
)并转换为datetime
,然后在pivot_table
上设置多个索引
喜欢:
dataset.pivot_table(index=['M', 'Month'], ...)