Pandas GroupBy - 仅显示具有多个唯一特征值的组

问题描述 投票:0回答:3

我有一个 DataFrame

df_things
,看起来像这样,我想在训练之前预测分类的质量

A    B     C      CLASS
-----------------------
al1  bal1  cal1   Ship
al1  bal1  cal1   Ship
al1  bal2  cal2   Ship
al2  bal2  cal2   Cow
al3  bal3  cal3   Car
al1  bal2  cal3   Car
al3  bal3  cal3   Car

我想按类对行进行分组,以便我了解特征的分布。我这样做(例如,在“B”栏),

df_B = df_things.groupby('CLASS').B.value_counts()

这给了我结果

CLASS  B 
-------------
ship   bal1  2 
       bal2  1
cow    bal2  2
car    bal2  1
       bal3  2

我想要的是仅可视化具有多个值的组,使其看起来像这样:

CLASS  B 
-------------
ship   bal1  2 
       bal2  1
car    bal2  1
       bal3  2

我有点卡住了,有什么想法吗?

python pandas compare unique pandas-groupby
3个回答
4
投票

您可以使用

groupby
过滤
nunique
计数超过 1 的组。

v = df_things.groupby('CLASS').B.value_counts()
v[v.groupby(level=0).transform('nunique').gt(1)]

CLASS  B   
Car    bal3    2
       bal2    1
Ship   bal1    2
       bal2    1
Name: B, dtype: int64

2
投票

来自

crosstab

的解决方案
s=pd.crosstab(df.CLASS,df.B)
s[s.ne(0).sum(1)>1].replace(0,np.nan).stack()
CLASS  B   
Car    bal2    1.0
       bal3    2.0
Ship   bal1    2.0
       bal2    1.0
dtype: float64

0
投票

这是另一种方法。

设置输入数据:

In [1]:
import pandas as pd
df_things = pd.DataFrame({
    'A': ['al1', 'al1', 'al1', 'al2', 'al3', 'al1', 'al3'],
    'B': ['bal1', 'bal1', 'bal2', 'bal2', 'bal3', 'bal2', 'bal3'],
    'C': ['cal1', 'cal1', 'cal2', 'cal2', 'cal3', 'cal3', 'cal3'],
    'CLASS': ['Ship', 'Ship', 'Ship', 'Cow', 'Car', 'Car', 'Car']
})
print(df_things)
     A     B     C CLASS
0  al1  bal1  cal1  Ship
1  al1  bal1  cal1  Ship
2  al1  bal2  cal2  Ship
3  al2  bal2  cal2   Cow
4  al3  bal3  cal3   Car
5  al1  bal2  cal3   Car
6  al3  bal3  cal3   Car

将其减少为具有多个唯一值的组

In [2]:
df_reduced = df_things.groupby(['CLASS']).filter(lambda grp: grp['B'].nunique() > 1)
print(df_reduced)
     A     B     C CLASS
0  al1  bal1  cal1  Ship
1  al1  bal1  cal1  Ship
2  al1  bal2  cal2  Ship
4  al3  bal3  cal3   Car
5  al1  bal2  cal3   Car
6  al3  bal3  cal3   Car

应用 groupby 以获得所需的输出

In [3]:
df_reduced.groupby(['CLASS'])['B'].value_counts()
Out[3]:
CLASS  B
Car    bal3    2
       bal2    1
Ship   bal1    2
       bal2    1
Name: B, dtype: int64

顺便说一句,您的问题中的 df_B 有一个拼写错误。应该是

In [4]:
df_B = df_things.groupby('CLASS').B.value_counts()
print(df_B)
CLASS  B
Car    bal3    2
       bal2    1
Cow    bal2    1
© www.soinside.com 2019 - 2024. All rights reserved.