我有这些表:
项目
id | 姓名 | 版本 |
---|---|---|
1 | 游泳 | 0.0.1 |
2 | 迪纳利 | 0.0.1 |
3 | 迪纳利 | 0.0.2 |
4 | 大R | 0.0.3 |
5 | 羽衣甘蓝 | 0.0.1 |
6 | 羽衣甘蓝 | 0.0.2 |
人
id | 姓名 |
---|---|
1 | 杰克 |
2 | 斯凯 |
3 | 基思 |
4 | 吉姆 |
5 | 伊丽莎白 |
6 | 豪恩 |
人_项目
id | person_id | project_id |
---|---|---|
1 | 1 | 1 |
2 | 2 | 1 |
3 | 2 | 2 |
4 | 3 | 1 |
5 | 3 | 2 |
6 | 4 | 1 |
7 | 4 | 4 |
8 | 5 | 1 |
9 | 6 | 1 |
10 | 6 | 2 |
11 | 6 | 3 |
我想找到所有从事相同项目的人完全匹配。根据以上数据,结果应该是人 1 和 5,因为他们都在项目 1 上工作,人 2 和 3,因为他们在同一个项目 1 和 2 上工作。
不应返回 4 和 6,因为没有其他人在从事他们正在从事的确切项目。
查询:
with projects as
(
select person_id, STRING_AGG (project_id::varchar,' , ' order by project_id ) project_ids from Person_Project
group by Person_id
)
,cte as
(select project_ids from projects group by project_ids having count(*)>1)
select person_id from projects where project_ids in (select project_ids from cte)
输出:
person_id |
---|
1 |
2 |
3 |
5 |
这可以优化。
SELECT person_id
FROM (
SELECT person_id, count(*) OVER (PARTITION BY projects) AS match_ct
FROM (
SELECT person_id, array_agg(project_id) AS projects
FROM person_project
GROUP BY 1
) sub1
) sub2
WHERE match_ct > 1;
EXISTS
WITH cte AS (
SELECT person_id, array_agg(project_id) AS projects
FROM person_project
GROUP BY 1
)
SELECT person_id
FROM cte c1
WHERE EXISTS (
SELECT FROM cte c2
WHERE c2.projects = c1.projects
AND c2.person_id <> c1.person_id
);
此外,
array_agg()
应该比string_agg()
快(也避免演员表)。
对于large人数和/或项目,应该创建一个临时表而不是CTE,在
(hash_array(projects), person_id)
上添加索引,并比较哈希值。
对于非常大的集合,使用
hash_array_extended(projects, 0)
实际上排除哈希冲突。
相关: