我正在尝试以某种方式将现有的 SQL 语句转换为 pandas。 这些是我正在使用的数据框:
df_products:
ID PRODUCT_ID NAME STOCK SELL_COUNT DELIVERED_BY
1 P1 PRODUCT_P1 12 15 UPS
2 P2 PRODUCT_P2 4 3 DHL
3 P3 PRODUCT_P3 120 22 DHL
4 P1 PRODUCT_P1 423 18 UPS
5 P2 PRODUCT_P2 0 5 GLS
6 P3 PRODUCT_P3 53 10 DHL
7 P4 PRODUCT_P4 22 0 UPS
8 P1 PRODUCT_P1 94 56 GLS
9 P1 PRODUCT_P1 9 24 GLS
和
df_accessories:
ID ACCESSORY_ID NAME DEL_BY SUITABLE_FOR MANUFACTURER
100 A1 ACCESSORY_1 DHL P1 KUNG
101 A2 ACCESSORY_2 UPS P1 PAO
102 A3 ACCESSORY_3 GLS P1 PAO
103 A4 ACCESSORY_4 UPS P3 PAK
104 A5 ACCESSORY_5 DHL P2 PAK
我正在尝试应用此 SQL 查询的 pandas 版本:
SELECT *
FROM products a
LEFT JOIN accessories b
ON b.DEL_BY = 'UPS'
AND a.PRODUCT_ID = b.SUITABLE_FOR
AND b.MANUFACTURER != 'PAK'
我尝试这样解决这个问题:
joined = df_products.merge(df_accessories, left_on='PRODUCT_ID', right_on='SUITABLE_FOR', how='left')
filtered = joined.loc[(joined['DEL_BY'] == 'UPS') & (joined['MANUFACTURER'] != 'PAK')]
但我不认为这样行得通。我已经在努力处理第一个 ON b.DEL_BY = 'UPS' 语句,我不知道将其放在 pandas 合并函数中的何处。
我期待这个结果:
ID PRODUCT_ID NAME STOCK SELL_COUNT DELIVERED_BY ďťżID ACCESSORY_ID NAME.1 DEL_BY SUITABLE_FOR MANUFACTURER
0 1 P1 PRODUCT_P1 12 15 UPS 101.0 A2 ACCESSORY_2 UPS P1 PAO
1 2 P2 PRODUCT_P2 4 3 DHL NaN NaN NaN NaN NaN NaN
2 3 P3 PRODUCT_P3 120 22 DHL NaN NaN NaN NaN NaN NaN
3 4 P1 PRODUCT_P1 423 18 UPS 101.0 A2 ACCESSORY_2 UPS P1 PAO
4 5 P2 PRODUCT_P2 0 5 GLS NaN NaN NaN NaN NaN NaN
5 6 P3 PRODUCT_P3 53 10 DHL NaN NaN NaN NaN NaN NaN
6 7 P4 PRODUCT_P4 22 0 UPS NaN NaN NaN NaN NaN NaN
7 8 P1 PRODUCT_P1 94 56 GLS 101.0 A2 ACCESSORY_2 UPS P1 PAO
8 9 P1 PRODUCT_P1 9 24 GLS 101.0 A2 ACCESSORY_2 UPS P1 PAO
但我得到的是这个:
ID_x PRODUCT_ID NAME_x STOCK SELL_COUNT DELIVERED_BY ID_y ACCESSORY_ID NAME_y DEL_BY SUITABLE_FOR MANUFACTURER
1 1 P1 PRODUCT_P1 12 15 UPS 101.0 A2 ACCESSORY_2 UPS P1 PAO
6 4 P1 PRODUCT_P1 423 18 UPS 101.0 A2 ACCESSORY_2 UPS P1 PAO
12 8 P1 PRODUCT_P1 94 56 GLS 101.0 A2 ACCESSORY_2 UPS P1 PAO
15 9 P1 PRODUCT_P1 9 24 GLS 101.0 A2 ACCESSORY_2 UPS P1 PAO
谢谢
您在合并之前过滤了正确的数据框:
df_products.merge(df_accessories.query('DEL_BY == "UPS" and MANUFACTURER != "PAK"'),
left_on='PRODUCT_ID', right_on='SUITABLE_FOR', how='left',
suffixes=('', '.1'))
.query(...)
块相当于对数据框进行切片:
cond = (df_accessories['DEL_BY'] == 'UPS') & (df_accessories['MANUFACTURER'] != 'PAK')
df_products.merge(df_accessories[cond], ...)
我会这样做,首先根据连接中未连接到 df_product 的条件过滤 df_accessory,然后使用合并连接到 df_product,如下所示:
(
df_accessory
.query('MANUFACTURER != "PAK" and DEL_BY == "UPS"')
.merge(df_product, right_on='PRODUCT_ID', left_on='SUITABLE_FOR', how='right')
.sort_values('ID_y')
)
输出:
ID_x ACCESSORY_ID NAME_x DEL_BY SUITABLE_FOR MANUFACTURER ID_y PRODUCT_ID NAME_y STOCK SELL_COUNT DELIVERED_BY
0 101.0 A2 ACCESSORY_2 UPS P1 PAO 1 P1 PRODUCT_P1 12 15 UPS
4 NaN NaN NaN NaN NaN NaN 2 P2 PRODUCT_P2 4 3 DHL
6 NaN NaN NaN NaN NaN NaN 3 P3 PRODUCT_P3 120 22 DHL
1 101.0 A2 ACCESSORY_2 UPS P1 PAO 4 P1 PRODUCT_P1 423 18 UPS
5 NaN NaN NaN NaN NaN NaN 5 P2 PRODUCT_P2 0 5 GLS
7 NaN NaN NaN NaN NaN NaN 6 P3 PRODUCT_P3 53 10 DHL
8 NaN NaN NaN NaN NaN NaN 7 P4 PRODUCT_P4 22 0 UPS
2 101.0 A2 ACCESSORY_2 UPS P1 PAO 8 P1 PRODUCT_P1 94 56 GLS
3 101.0 A2 ACCESSORY_2 UPS P1 PAO 9 P1 PRODUCT_P1 9 24 GLS