嵌套相交

问题描述 投票:0回答:2

我有一个已售出商品的表包含customer_id和item_name。我需要创建一个新表来获取包含customer_a_id,customer_b_id,intersected_items_count的相交项的客户。

我用光标和嵌套for循环编写了一个PL / SQL过程来执行此操作,但是如果我有一百万个客户这意味着1m * 1m循环

我的问题是:嵌套交叉是否有任何sql方法(与表中所有行相交的所有行)

我的桌子是这样的:

customer_id   item
1              Meat 
1              Rice 
2              Meat
2              Soups 
3              Pasta 

请求输出:

customer_a_id customer_b_id intersected_items
1              2             1
1              3             0
2              1             1
2              3             0
3              1             0
3              2             0
sql oracle plsql
2个回答
-1
投票

customer表的自联接将生成所需的结果集

    SELECT c1.id        customer_a_id 
         , c2.id        customer_b_id
         , COUNT(*)     intersected_items
      FROM customer c1
      JOIN customer c2
        ON (
                 c1.id <> c2.id
             AND c1.item = c2.item
           )
  GROUP BY c1.id
         , c2.id
         ;

c1.id < c2.id有明显的优化。

补充

正如@JuanCarlosOropeza所指出的,上述解决方案不包含具有非交叉项集的id对。这是设计的,假设引用的表大小为10 ^ 6。

但是,为了完整性并承认OP没有要求跳过这些配对,以下查询也会生成非交叉项:

    SELECT x.customer_a_id
         , x.customer_b_id
         , COALESCE(matches.intersected_items, 0)   intersected_items
      FROM (
                SELECT c_all_1.id        customer_a_id 
                     , c_all_2.id        customer_b_id
                  FROM customer c_all_1
            CROSS JOIN customer c_all_2
                 WHERE c_all_1.id < c_all_2.id
              GROUP BY c_all_1.id
                     , c_all_2.id
           ) x
 LEFT JOIN (
                SELECT c1.id        customer_a_id 
                     , c2.id        customer_b_id
                     , COUNT(*)     intersected_items
                  FROM customer c1
                  JOIN customer c2
                    ON (
                             c1.id < c2.id
                         AND c1.item = c2.item
                       )
              GROUP BY c1.id
                     , c2.id
           ) matches
        ON (
                matches.customer_a_id = x.customer_a_id
            AND matches.customer_b_id = x.customer_b_id
           )
  ORDER BY intersected_items desc
         , customer_a_id 
         , customer_b_id 
         ;

1
投票

我会用cross joinleft joins这样做:

select c1.customer_id, c2.customer_id, count(t2.item) as num_intersected_items
from (select distinct customer_id from t) c1 cross join
     (select distinct customer_id from t) c2 left join
     t t1
     on t1.customer_id = c1.customer_id left join
     t t2
     on t2.customer_id = c2.customer_id and t2.item = t1.item and
where c1.customer_id <> c2.customer_id
group by c1.customer_id, c2.customer_id;

此版本使您可以控制客户ID - 它们可以来自不同的表,包括没有项目的客户。

如果所有项目都来自同一个表格,结果相当于left join

select t1.customer_id, t2.customer_id, count(t2.item) as num_intersected_items
from t t1 left join
     t t2
     on t1.item = t2.item 
where t1.customer_id <> t2.customer_id
group by c1.customer_id, c2.customer_id;
© www.soinside.com 2019 - 2024. All rights reserved.