我有一个数据框:
data = {
"Key": ["A1", "A2", np.nan, "A3", "A4"],
"Name": ["Candy A", "Candy B", np.nan, "Candy C", "Candy D"],
"Amout": [25, 50, np.nan, np.nan, 50],
"Condition": ["Good", "Good", "Good", "Good", "Good"],
"Packing": ["25 Nice", "49 Nice", "1 Damaged", "40 Nice", "50 Nice"],
"Sunlight" : [np.nan, np.nan, np.nan, np.nan, "No Sunlight"]
}
df = pd.DataFrame(data)
print(df)
Key Name Amout Condition Packing Sunlight
0 A1 Candy A 25.0 Good 25 Nice NaN
1 A2 Candy B 50.0 Good 49 Nice NaN
2 NaN NaN NaN Good 1 Damaged NaN
3 A3 Candy C NaN Good 40 Nice NaN
4 A4 Candy D 50.0 Good 50 Nice No Sunlight
我尝试更改数据框,使数据框更整齐。
def custom_agg(s):
if pd.api.types.is_numeric_dtype(s):
return s.sum(min_count=1)
s = s.dropna().drop_duplicates()
if len(s) > 1:
return ', '.join(s.astype(str))
return s
df = df.groupby(df['Key'].notna().cumsum(), as_index=False).agg(custom_agg)
print(df)
阳光一栏,大部分都是空白数组
Key Name Amout Condition Packing Sunlight
0 A1 Candy A 25.0 Good 25 Nice []
1 A2 Candy B 50.0 Good 49 Nice, 1 Damaged []
2 A3 Candy C NaN Good 40 Nice []
3 A4 Candy D 50.0 Good 50 Nice No Sunlight
{'index': [0, 1, 2, 3], 'columns': ['Key', 'Name', 'Amout', 'Condition', 'Packing', 'Sunlight'], 'data': [['A1', 'Candy A', 25.0, 'Good', '25 Nice', array([], dtype=object)], ['A2', 'Candy B', 50.0, 'Good', '49 Nice, 1 Damaged', array([], dtype=object)], ['A3', 'Candy C', nan, 'Good', '40 Nice', array([], dtype=object)], ['A4', 'Candy D', 50.0, 'Good', '50 Nice', 'No Sunlight']], 'index_names': [None], 'column_names': [None]}
我希望输出为 NaN,没有空白数组。我尝试过替换和屏蔽,但没有用。有什么想法吗?
我想你可以在
custom_agg
函数中添加一个检查,如果 s
是空系列,则返回 np.nan
s = s.dropna().drop_duplicates()
if len(s) == 0:
return np.nan
所以你会得到:
import pandas as pd
import numpy as np
data = {
"Key": ["A1", "A2", np.nan, "A3", "A4"],
"Name": ["Candy A", "Candy B", np.nan, "Candy C", "Candy D"],
"Amout": [25, 50, np.nan, np.nan, 50],
"Condition": ["Good", "Good", "Good", "Good", "Good"],
"Packing": ["25 Nice", "49 Nice", "1 Damaged", "40 Nice", "50 Nice"],
"Sunlight" : [np.nan, np.nan, np.nan, np.nan, "No Sunlight"]
}
df = pd.DataFrame(data)
print(df)
def custom_agg(s):
if pd.api.types.is_numeric_dtype(s):
return s.sum(min_count=1)
s = s.dropna().drop_duplicates()
if len(s) > 1:
return ', '.join(s.astype(str))
elif len(s) == 0:
return np.nan
else:
return s
df = df.groupby(df['Key'].notna().cumsum(), as_index=False).agg(custom_agg)
print(df)