我想知道是否可以以有意义且高效的方式组合多个相似的存在条件。
让我们假设以下示例:可以将不同的活动分配给一个服务(n-m)。活动可以独立地分为活动组。活动组可以分配给组类型。
如果我现在想要查找引用某些组类型的所有服务,并且想通过
OR
链接条件,那么通过组合 EXISTS
和 IN
,这相对简单。
select *
from service
where exists (
select 1
from activitiy
join activitiy_activitiy_group
on activitiy.id = activitiy_activitiy_group.id_activitiy
join activitiy_group
on activitiy_activitiy_group.id_activitiy_group = activitiy_group.id
where (
activitiy_group.id_type in (1, 3)
and activitiy.id_service = service.id
);
另一方面,如果我想通过 AND 来链接条件,那么事情就没那么简单了。我可以添加多个退出条件:
select *
from service
where exists (
select 1
from activitiy
join activitiy_activitiy_group
on activitiy.id = activitiy_activitiy_group.id_activitiy
join activitiy_group
on activitiy_activitiy_group.id_activitiy_group = activitiy_group.id
where (
activitiy_group.id_type = 1
and activitiy.id_service = service.id
)
and
exists (
select 1
from activitiy
join activitiy_activitiy_group
on activitiy.id = activitiy_activitiy_group.id_activitiy
join activitiy_group
on activitiy_activitiy_group.id_activitiy_group = activitiy_group.id
where (
activitiy_group.id_type = 3
and activitiy.id_service = service.id
);
但我想知道这种方法对于许多过滤元件是否有效。我进行了一些实验,一种方法是仅使用一个子选择,将与服务相关的所有不同活动组类型 ID 选择到一个数组中,并将其与过滤器值进行比较:
select *
from service
where true =
(select ARRAY_AGG(activitiy_group.id_type) @> ('{1,3}'::Integer[])
from activitiy
join activitiy_activitiy_group
on activitiy.id = activitiy_activitiy_group.id_activitiy
join activitiy_group
on activitiy_activitiy_group.id_activitiy_group = activitiy_group.id
where ativitiy.id_service = service.id);
但这里也出现了这样的问题:这是否真的有效。 任何人都可以评估这一点,或者是否有更明智的替代方法?我认为底层的基本问题是一个标准问题,但不幸的是在互联网上找不到任何其他方法。
我会选择一个涉及的表很大且写入相对较少的设置。那么创建一个辅助
MATERIALIZED VIEW
(once) 来显着加快查询速度是有意义的:
CREATE MATERIALIZED VIEW service_activity_types AS
SELECT a.id_service, array_agg(ag.id_type) AS activity_types
FROM (
SELECT DISTINCT a.id_service, ag.id_type
FROM activitiy a
JOIN activitiy_activitiy_group aag ON aag.id_activitiy = a.id
JOIN activitiy_group ag ON ag.id = aag.id_activitiy_group
ORDER BY 1, 2
) sub
GROUP BY 1;
生成一个包含独特服务和一系列独特活动类型的表。
(在子查询中应用一次
DISTINCT
和 ORDER BY
通常会更快。)
在 service_activity_types
上创建
唯一索引以允许刷新 MV
CONCURRENTLY
:
CREATE UNIQUE INDEX service_activity_type_uni ON service_activity_types (id_service);
对基础表进行有影响的更改后刷新:
REFRESH MATERIALIZED VIEW CONCURRENTLY service_activity_types;
在数组列上创建索引以使查询更快。有多种选择。对于您的情况,我希望使用附加模块
gin__int_ops
中的运算符类
intarray
的 GIN 索引是最快的。首先为每个数据库安装一次模块。参见:
CREATE INDEX service_activity_type_gin_idx ON service_activity_types USING gin (activity_types gin__int_ops);
甚至可能是多列索引。参见:
此外,要开始进行列统计:
ANALYZE service_activity_types;
那么您的查询可以是:
SELECT id_service
FROM service_activity_types
WHERE activity_types @> '{1,3}';
而且速度会非常快。