如何在pandas中进行多个条件的左连接?

问题描述 投票:0回答:2

我正在尝试以某种方式将现有的 SQL 语句转换为 pandas。 这些是我正在使用的数据框:

df_products:

ID  PRODUCT_ID        NAME  STOCK  SELL_COUNT DELIVERED_BY        
1         P1  PRODUCT_P1     12          15          UPS  
2         P2  PRODUCT_P2      4           3          DHL  
3         P3  PRODUCT_P3    120          22          DHL  
4         P1  PRODUCT_P1    423          18          UPS  
5         P2  PRODUCT_P2      0           5          GLS  
6         P3  PRODUCT_P3     53          10          DHL  
7         P4  PRODUCT_P4     22           0          UPS  
8         P1  PRODUCT_P1     94          56          GLS  
9         P1  PRODUCT_P1      9          24          GLS

df_accessories:

ID ACCESSORY_ID         NAME DEL_BY SUITABLE_FOR MANUFACTURER
100           A1  ACCESSORY_1    DHL           P1         KUNG
101           A2  ACCESSORY_2    UPS           P1          PAO
102           A3  ACCESSORY_3    GLS           P1          PAO
103           A4  ACCESSORY_4    UPS           P3          PAK
104           A5  ACCESSORY_5    DHL           P2          PAK

我正在尝试应用此 SQL 查询的 pandas 版本:

SELECT *
FROM products a
LEFT JOIN accessories b
    ON b.DEL_BY = 'UPS'
    AND a.PRODUCT_ID = b.SUITABLE_FOR
    AND b.MANUFACTURER != 'PAK'

我尝试这样解决这个问题:

joined = df_products.merge(df_accessories, left_on='PRODUCT_ID', right_on='SUITABLE_FOR', how='left')
filtered = joined.loc[(joined['DEL_BY'] == 'UPS') & (joined['MANUFACTURER'] != 'PAK')]

但我不认为这样行得通。我已经在努力处理第一个 ON b.DEL_BY = 'UPS' 语句,我不知道将其放在 pandas 合并函数中的何处。

我期待这个结果:

   ID PRODUCT_ID        NAME  STOCK  SELL_COUNT DELIVERED_BY  ďťżID ACCESSORY_ID       NAME.1 DEL_BY SUITABLE_FOR MANUFACTURER
0   1         P1  PRODUCT_P1     12          15          UPS  101.0           A2  ACCESSORY_2    UPS           P1          PAO
1   2         P2  PRODUCT_P2      4           3          DHL    NaN          NaN          NaN    NaN          NaN          NaN
2   3         P3  PRODUCT_P3    120          22          DHL    NaN          NaN          NaN    NaN          NaN          NaN
3   4         P1  PRODUCT_P1    423          18          UPS  101.0           A2  ACCESSORY_2    UPS           P1          PAO
4   5         P2  PRODUCT_P2      0           5          GLS    NaN          NaN          NaN    NaN          NaN          NaN
5   6         P3  PRODUCT_P3     53          10          DHL    NaN          NaN          NaN    NaN          NaN          NaN
6   7         P4  PRODUCT_P4     22           0          UPS    NaN          NaN          NaN    NaN          NaN          NaN
7   8         P1  PRODUCT_P1     94          56          GLS  101.0           A2  ACCESSORY_2    UPS           P1          PAO
8   9         P1  PRODUCT_P1      9          24          GLS  101.0           A2  ACCESSORY_2    UPS           P1          PAO

但我得到的是这个:

    ID_x PRODUCT_ID      NAME_x  STOCK  SELL_COUNT DELIVERED_BY   ID_y ACCESSORY_ID       NAME_y DEL_BY SUITABLE_FOR MANUFACTURER
1      1         P1  PRODUCT_P1     12          15          UPS  101.0           A2  ACCESSORY_2    UPS           P1          PAO
6      4         P1  PRODUCT_P1    423          18          UPS  101.0           A2  ACCESSORY_2    UPS           P1          PAO
12     8         P1  PRODUCT_P1     94          56          GLS  101.0           A2  ACCESSORY_2    UPS           P1          PAO
15     9         P1  PRODUCT_P1      9          24          GLS  101.0           A2  ACCESSORY_2    UPS           P1          PAO

谢谢

sql pandas join left-join
2个回答
2
投票

您在合并之前过滤了正确的数据框:

df_products.merge(df_accessories.query('DEL_BY == "UPS" and MANUFACTURER != "PAK"'),
                  left_on='PRODUCT_ID', right_on='SUITABLE_FOR', how='left',
                  suffixes=('', '.1'))

.query(...)
块相当于对数据框进行切片:

cond = (df_accessories['DEL_BY'] == 'UPS') & (df_accessories['MANUFACTURER'] != 'PAK')
df_products.merge(df_accessories[cond], ...)

2
投票

我会这样做,首先根据连接中未连接到 df_product 的条件过滤 df_accessory,然后使用合并连接到 df_product,如下所示:

(
    df_accessory
    .query('MANUFACTURER != "PAK" and DEL_BY == "UPS"')
    .merge(df_product, right_on='PRODUCT_ID', left_on='SUITABLE_FOR', how='right')
    .sort_values('ID_y')
)

输出:

    ID_x ACCESSORY_ID       NAME_x DEL_BY SUITABLE_FOR MANUFACTURER  ID_y PRODUCT_ID      NAME_y  STOCK  SELL_COUNT DELIVERED_BY
0  101.0           A2  ACCESSORY_2    UPS           P1          PAO     1         P1  PRODUCT_P1     12          15          UPS
4    NaN          NaN          NaN    NaN          NaN          NaN     2         P2  PRODUCT_P2      4           3          DHL
6    NaN          NaN          NaN    NaN          NaN          NaN     3         P3  PRODUCT_P3    120          22          DHL
1  101.0           A2  ACCESSORY_2    UPS           P1          PAO     4         P1  PRODUCT_P1    423          18          UPS
5    NaN          NaN          NaN    NaN          NaN          NaN     5         P2  PRODUCT_P2      0           5          GLS
7    NaN          NaN          NaN    NaN          NaN          NaN     6         P3  PRODUCT_P3     53          10          DHL
8    NaN          NaN          NaN    NaN          NaN          NaN     7         P4  PRODUCT_P4     22           0          UPS
2  101.0           A2  ACCESSORY_2    UPS           P1          PAO     8         P1  PRODUCT_P1     94          56          GLS
3  101.0           A2  ACCESSORY_2    UPS           P1          PAO     9         P1  PRODUCT_P1      9          24          GLS
© www.soinside.com 2019 - 2024. All rights reserved.