这个问题在这里已有答案:
我试图通过将数据中的一个列转换为行(通过旋转或取消堆叠)来重塑pandas数据帧。
我是新手,很可能我错过了一些明显的东西。我进行了广泛的搜索,但未能成功应用我遇到的任何解决方案。
df
Location Month Metric Value
0 Texas January Temperature 10
1 New York January Temperature 20
2 California January Temperature 30
3 Alaska January Temperature 40
4 Texas January Color Red
5 New York January Color Blue
6 California January Color Green
7 Alaska January Color Yellow
8 Texas February Temperature 15
9 New York February Temperature 25
10 California February Temperature 35
11 Alaska February Temperature NaN
12 Texas February Color NaN
13 New York February Color Purple
14 California February Color Orange
15 Alaska February Color Brown
我试图将度量值“转动”到列中。最终目标是这样的结果:
Location Month Temperature Color
Texas January 10 Red
New York January 20 Blue
California January 30 Green
Alaska January 40 Yellow
Texas February 15
New York February 25 Purple
California February 35 Orange
Alaska February Brown
我尝试过使用pivot,pivot_table以及unstack方法,但我确定我错过了一些东西。许多复杂性似乎是因为我将字符串与数字混合在一起,并且数据中也有一些缺失值。
这是我迄今为止能够获得的最接近的,但我不希望月份列有额外的行,从而产生更多空白值:
df.set_index(['Location','Month','Metric'], append=True, inplace=True)
df.unstack()
Value
Metric Color Temperature
Location Month
0 Texas January None 10
1 New York January None 20
2 California January None 30
3 Alaska January None 40
4 Texas January Red None
5 New York January Blue None
6 California January Green None
7 Alaska January Yellow None
这里的任何帮助将不胜感激。这似乎很可能有一个简单的解决方案。
满足您需求的枢纽解决方案。输出是你想要的语义 -
Metric Color Temperature
Location Month
Alaska February Brown NaN
January Yellow 40
California February Orange 35
January Green 30
New York February Purple 25
January Blue 20
Texas February NaN 15
January Red 10
代码 -
df_p = df.pivot_table(index=['Location', 'Month'], columns=['Metric'], values='Value', aggfunc=np.sum)
首先正确的排序将列Month
转换为ordered categorical,然后通过set_index
与unstack
重塑:
#add another months
cats = ['January','February']
df['Month'] = pd.Categorical(df['Month'], categories=cats, ordered=True)
或者如果在原始列中订购了几个月,感谢@asongtoruin:
df['Month'] = pd.Categorical(df['Month'], categories=df['Month'].unique(), ordered=True)
df = (df.set_index(['Location','Month','Metric'])['Value']
.unstack()
.reset_index()
.rename_axis(None, axis=1)
.sort_values('Month'))
print (df)
Location Month Color Temperature
0 Alaska January Yellow 40
2 California January Green 30
4 New York January Blue 20
6 Texas January Red 10
1 Alaska February Brown NaN
3 California February Orange 35
5 New York February Purple 25
7 Texas February NaN 15