我有一个子类别列表和一个数据框。我想根据列表的每个子类别过滤出数据框。
lst = [7774, 29409, 36611, 77553]
import pandas as pd
data = {'aucctlg_id': [143424, 143424, 143424, 143388, 143388, 143430],
'catalogversion_id': [1, 1, 1, 1, 1, 1.2],
'Key': [1434241, 1434241, 1434241, 1433881, 1433881, 14343012],
'item_id': [4501118, 4501130, 4501129, 4501128, 4501127, 4501126],
'catlog_description': ['M&BP PIG IRON FA', 'M&BP PIG IRON FA', 'M&BP PIG IRON FA', 'PIG IRON MIXED OG','PIG IRON MIXED OG', 'P.S JAM & PIG IRON FINES'],
'catlog_start_date': ['17-05-2024 11:00:00', '17-05-2024 11:00:00', '17-05-2024 11:00:00', '17-05-2024 11:00:00','17-05-2024 11:00:00', '17-05-2024 11:00:00'],
'subcategoryid': [29409, 29409, 29409, 7774, 7774, 36611],
'quantity': [200, 200, 200, 180, 180, 100],
'auctionable': ['Y', 'Y', 'Y', 'Y' ,'Y' ,'Y']
}
df = pd.DataFrame(data)
print(df)
我尝试使用以下代码,但我希望输出为数据框,它会生成一个列表和单个子类别:
new=[]
for i in range(0, len(lst)):
mask1 = df['subcategoryid']==(lst[i])
df2 = df.loc[mask1]
new.append(df2)
所需的输出文件,以及过滤后的数据:
df_7774, df_29409, df_36611
isin
进行预过滤,然后使用 groupby
:
lst = [7774, 29409, 36611, 77553]
out = dict(list(df[df['subcategoryid'].isin(lst)].groupby('subcategoryid')))
这将创建所需数据帧的字典:
{7774: aucctlg_id catalogversion_id Key item_id catlog_description catlog_start_date subcategoryid quantity auctionable
3 143388 1.0 1433881 4501128 PIG IRON MIXED OG 17-05-2024 11:00:00 7774 180 Y
4 143388 1.0 1433881 4501127 PIG IRON MIXED OG 17-05-2024 11:00:00 7774 180 Y,
29409: aucctlg_id catalogversion_id Key item_id catlog_description catlog_start_date subcategoryid quantity auctionable
0 143424 1.0 1434241 4501118 M&BP PIG IRON FA 17-05-2024 11:00:00 29409 200 Y
1 143424 1.0 1434241 4501130 M&BP PIG IRON FA 17-05-2024 11:00:00 29409 200 Y
2 143424 1.0 1434241 4501129 M&BP PIG IRON FA 17-05-2024 11:00:00 29409 200 Y,
36611: aucctlg_id catalogversion_id Key item_id catlog_description catlog_start_date subcategoryid quantity auctionable
5 143430 1.2 14343012 4501126 P.S JAM & PIG IRON FINES 17-05-2024 11:00:00 36611 100 Y}
如果您想创建没有中间的文件:
lst = [7774, 29409, 36611, 77553]
for k, g in df[df['subcategoryid'].isin(lst)].groupby('subcategoryid')):
d.to_csv(f'df_{k}')