Pandas:Groupby 聚合后返回空白数组

问题描述 投票:0回答:1

我有一个数据框:

data = {
  "Key": ["A1", "A2", np.nan, "A3", "A4"],
  "Name": ["Candy A", "Candy B", np.nan, "Candy C", "Candy D"],
  "Amout": [25, 50, np.nan, np.nan, 50],
  "Condition": ["Good", "Good", "Good", "Good", "Good"],
  "Packing": ["25 Nice", "49 Nice", "1 Damaged", "40 Nice", "50 Nice"],
  "Sunlight" : [np.nan, np.nan, np.nan, np.nan, "No Sunlight"]
}

df = pd.DataFrame(data)

print(df) 
   Key     Name  Amout Condition    Packing     Sunlight
0   A1  Candy A   25.0      Good    25 Nice          NaN
1   A2  Candy B   50.0      Good    49 Nice          NaN
2  NaN      NaN    NaN      Good  1 Damaged          NaN
3   A3  Candy C    NaN      Good    40 Nice          NaN
4   A4  Candy D   50.0      Good    50 Nice  No Sunlight

我尝试更改数据框,使数据框更整齐。

def custom_agg(s):
                if pd.api.types.is_numeric_dtype(s):
                    return s.sum(min_count=1)
                s = s.dropna().drop_duplicates()
                if len(s) > 1:
                    return ', '.join(s.astype(str))
                return s

df = df.groupby(df['Key'].notna().cumsum(), as_index=False).agg(custom_agg)

print(df) 

阳光一栏,大部分都是空白数组

Key     Name  Amout Condition             Packing     Sunlight
0  A1  Candy A   25.0      Good             25 Nice           []
1  A2  Candy B   50.0      Good  49 Nice, 1 Damaged           []
2  A3  Candy C    NaN      Good             40 Nice           []
3  A4  Candy D   50.0      Good             50 Nice  No Sunlight
{'index': [0, 1, 2, 3], 'columns': ['Key', 'Name', 'Amout', 'Condition', 'Packing', 'Sunlight'], 'data': [['A1', 'Candy A', 25.0, 'Good', '25 Nice', array([], dtype=object)], ['A2', 'Candy B', 50.0, 'Good', '49 Nice, 1 Damaged', array([], dtype=object)], ['A3', 'Candy C', nan, 'Good', '40 Nice', array([], dtype=object)], ['A4', 'Candy D', 50.0, 'Good', '50 Nice', 'No Sunlight']], 'index_names': [None], 'column_names': [None]}

我希望输出为 NaN,没有空白数组。我尝试过替换和屏蔽,但没有用。有什么想法吗?

python pandas dataframe group-by aggregate
1个回答
0
投票

我想你可以在

custom_agg
函数中添加一个检查,如果
s
是空系列,则返回
np.nan

s = s.dropna().drop_duplicates()
if len(s) == 0:
  return np.nan

所以你会得到:

import pandas as pd
import numpy as np

data = {
  "Key": ["A1", "A2", np.nan, "A3", "A4"],
  "Name": ["Candy A", "Candy B", np.nan, "Candy C", "Candy D"],
  "Amout": [25, 50, np.nan, np.nan, 50],
  "Condition": ["Good", "Good", "Good", "Good", "Good"],
  "Packing": ["25 Nice", "49 Nice", "1 Damaged", "40 Nice", "50 Nice"],
  "Sunlight" : [np.nan, np.nan, np.nan, np.nan, "No Sunlight"]
}

df = pd.DataFrame(data)

print(df) 

def custom_agg(s):
    if pd.api.types.is_numeric_dtype(s):
        return s.sum(min_count=1)
    s = s.dropna().drop_duplicates()
    if len(s) > 1:
        return ', '.join(s.astype(str))
    elif len(s) == 0:
        return np.nan
    else:
        return s

df = df.groupby(df['Key'].notna().cumsum(), as_index=False).agg(custom_agg)

print(df)
© www.soinside.com 2019 - 2024. All rights reserved.