这是一个模拟我面临的问题的脚本:
import pandas as pd
import numpy as np
data = {
'a':[1,2,1,1,2,1,1],
'b':[10,40,20,10,40,10,20],
'c':[0.3, 0.2, 0.6, 0.4, 0.5, 0.2, 0.8],
'd':[3, 1, 5, 1, 7, 2., 2.],
}
df = pd.DataFrame.from_dict(data)
# I apply some custom function to populate column 'e'.
# For demonstration, I am using a very simple function here.
df['e']=df.apply(lambda x: x['c']<=0.3, axis=1)
# This is the column I need to obtain using groupby and pipe/transform
df['f']=[2., 1., 0., 2., 1., 2., 0.]
print(df)
输出:
a b c d e f
0 1 10 0.3 3.0 True 2.0
1 2 40 0.2 1.0 True 1.0
2 1 20 0.6 5.0 False 0.0
3 1 10 0.4 1.0 False 2.0
4 2 40 0.5 7.0 False 1.0
5 1 10 0.2 2.0 True 2.0
6 1 20 0.8 2.0 False 0.0
用于查找列
f
的逻辑如下:
对于每组
df.groupby(['a', 'b'])
:
e
具有 True 值的条目。d
最小的条目并返回d
(实际应用中,d
需要与其他列结合操作,然后返回结果)我尝试过的:
def func(x):
print(type(x))
print(x)
print('-'*50)
ind=np.where(x['e']) #<--- How can I implement this?
if len(ind)>0:
ind_min=np.argmin(x.iloc[ind]['d'])
return x.iloc[ind[ind_min]]['d']
else:
return 0
df['g']=df.groupby(['a', 'b']).pipe(func)
输出:
<class 'pandas.core.groupby.generic.DataFrameGroupBy'>
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001B348735550>
--------------------------------------------------
...
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (3, 2) + inhomogeneous part.
我在网上收到上述错误:
ind=np.where(x['e']) #<--- How can I implement this?
那么,如何将
np.where
应用于 pandas.core.groupby.generic.DataFrameGroupBy
对象呢?
您提出了一个XY 问题。使用
Series.where
之前涂抹df.groupby
:
df['f'] = (
df.assign(d=df['d'].where(df['e']))
.groupby(['a', 'b'])['d']
.transform('min')
.fillna(0)
)
输出:
a b c d e f
0 1 10 0.3 3.0 True 2.0
1 2 40 0.2 1.0 True 1.0
2 1 20 0.6 5.0 False 0.0
3 1 10 0.4 1.0 False 2.0
4 2 40 0.5 7.0 False 1.0
5 1 10 0.2 2.0 True 2.0
6 1 20 0.8 2.0 False 0.0