我有一个像这样的pandas数据框
ID Company Accepted
1 A Yes
1 B Yes
1 C No
2 B No
2 C No
3 A No
3 C Yes
3 D No
3 E Yes
4 A No
4 C No
我想过滤数据帧,以便过滤掉任何接受“是”的ID。所以我离开了
ID Company Accepted
2 B No
2 C No
4 A No
4 C No
因此,删除具有任何接受的ID的所有行。最聪明的方法是什么?谢谢!
这应该工作。集合是有效的,因为它们具有O(1)查找复杂性。
rem = set(df.loc[df['Accepted'] == 'Yes', 'ID'])
df = df[~df['ID'].isin(rem)]
# ID Company Accepted
# 3 2 B No
# 4 2 C No
# 9 4 A No
# 10 4 C No
既然你提到filter
df.groupby('ID').filter(lambda x : (x['Accepted']=='No').all())
Out[1017]:
ID Company Accepted
3 2 B No
4 2 C No
9 4 A No
10 4 C No
如果您的数据框称为df
,则会执行此操作
# find the ids that have at least one 'Yes' entry
yes_ids = df.loc[df.Accepted == 'Yes', 'ID']
# Drop all those indices
df.drop(df[df.ID.isin(yes_ids)].index)
返回:
ID Company Accepted
3 2 B No
4 2 C No
9 4 A No
10 4 C No