pandas根据每个组的另一列上的多个条件创建一个布尔列

问题描述 投票:0回答:1

我有以下df

cluster_id   inv_id    
1            A1
1            A1
2            A1111A
2            A1111A

我想groupby cluster_id并根据invalid_inv_id的两个条件创建一个名为inv_id的列:

1. in each cluster, if the length of inv_id (stripped of non numerics) < 100 set "invalid_inv_id" to true;

要么

2. in each cluster, if the length of inv_id is < 3 set "invalid_inv_id" to true.

代码就像,

df['inv_id_stp'] = df.inv_id.str.replace(r'\D+', '')

grouped = df.groupby('cluster_id')

invoices['invalid_inv_id'] = grouped['inv_id_stp'].transform(lambda x: x.str.len()) < 100

invoices['invalid_inv_id'] = grouped['inv_id'].transform(lambda x: x.str.len()) < 3

我想知道如何将这两个条件组成一行代码,所以结果看起来像,

cluster_id    inv_id    invalid_inv_id
1             A1         True
1             A1         True
2             A1111A     True
2             A1111A     True
python-3.x pandas dataframe pandas-groupby
1个回答
1
投票

这里不需要IIUC,groupby

(df.inv_id.str.len()<3)|(df.inv_id.str.replace(r'\D+', '').str.len()<100)
Out[472]: 
0    True
1    True
2    True
3    True
Name: inv_id, dtype: bool

既然需要any

((df.inv_id.str.len()<3)|(df.inv_id.str.replace(r'\D+', '').str.len()<100)).groupby(df['cluster_id']).transform('any')
© www.soinside.com 2019 - 2024. All rights reserved.