对于 groupby 中的每个组,我想对几列中的某些行求和并将它们输出到新列中
is_m_days
。
数据框:
data = {'ATEXT': ['', 'CT', 'RT', '', '', '', '', 'CT', 'CT', 'CT', 'TT', ''],
'BEGUZ_UE': [11.0, 23.0, 33.0, 15.0, 12.75, 19.75, 14.75, 23.0,
24.0, 24.0, 33.0, 15.0],
'subtract': [0.0, 0.0, 0.0, 0.2, np.nan, np.nan, 2.0, np.nan,
np.nan, np.nan, np.nan, 0.0],
'add': [3.92, 0.0, 0.0, 0.0, np.nan, np.nan, 0.0, np.nan, np.nan,
np.nan, np.nan, 3.57],
'UE_more_days': [np.nan, np.nan, 56.0, np.nan, np.nan, np.nan, np.nan,
np.nan, np.nan, np.nan, 104.0, np.nan]}
结果应该是:
ATEXT BEGUZ_UE subtract add UE_more_days is_m_days
0 11.00 *0.00* *3.92*
1 CT *23.00* 0.00 0.00
2 RT *33.00* 0.00 0.00 56.0
3 *15.00* 0.20 0.00 *74.92*
4 12.75
5 19.75
6 14.75 *2.00* *0.00*
7 CT *23.00*
8 RT *24.00*
9 CT *24.00*
10 CT *33.00* 104.0
11 *15.00* 0.00 3.57 *117.00*
12
etc
我的尝试是:
m = df['ATEXT'].eq("")
cond = (~m) & m.shift(-1)
df['UE_more_days'] = (df['BEGUZ_UE'].mask(m)
.groupby(m.cumsum()).cumsum()
.where(cond)
)
tmv = (df[['subtract', 'add']]
.shift()
.groupby(m.cumsum())
.transform('max')
.eval('add-subtract')
)
df['is_m_days'] = (df.groupby(m[::-1].cumsum())['BEGUZ_UE']
.transform('sum')
.add(tmv)
.where(cond)
.shift()
)
有更好的解决方案吗?
你的方法很好,你可以简化它以使用单个
groupby
(带有额外的布尔掩码):
m1 = df['ATEXT'].eq('')
m2 = m1 & m1.shift(fill_value=True)
m3 = m1!=m2
group = m2.cumsum()
df.loc[m3, 'is_m_days'] = (pd
.DataFrame({'A': df['BEGUZ_UE'].mask(m2),
'B': df['add'].sub(df['subtract']).where(m2)})
.groupby(group).transform('sum').sum(axis=1)
)
输出:
ATEXT BEGUZ_UE subtract add UE_more_days is_m_days
0 11.00 0.0 3.92 NaN NaN
1 CT 23.00 0.0 0.00 NaN NaN
2 RT 33.00 0.0 0.00 56.0 NaN
3 15.00 0.2 0.00 NaN 74.92
4 12.75 NaN NaN NaN NaN
5 19.75 NaN NaN NaN NaN
6 14.75 2.0 0.00 NaN NaN
7 CT 23.00 NaN NaN NaN NaN
8 CT 24.00 NaN NaN NaN NaN
9 CT 24.00 NaN NaN NaN NaN
10 TT 33.00 NaN NaN 104.0 NaN
11 15.00 0.0 3.57 NaN 117.00