我的表结构:
table_a(id, company_id, approval_status, is_locked)
table_b(tba_id, status)
我的查询:
SELECT COUNT(id) filter (WHERE approval_status = 2
AND is_locked = true AND EXISTS
(SELECT 1 from table_b WHERE table_b.tba_id = table_a.id
AND table_b.status = 2)
FROM table_a
GROUP BY company_id
我目前有以下索引,但性能仍然很慢:
CREATE INDEX multiple_filter_index ON table_a (approval_status, is_locked)
可以通过添加更好的索引来改善此查询的性能吗?
这是查询计划:
HashAggregate (cost=463013.07..463013.10 rows=2 width=11) (actual time=47632.476..47632.476 rows=2 loops=1)
Group Key: table_a.company_id
-> Seq Scan on table_a (cost=0.00..3064.62 rows=100062 width=11) (actual time=0.003..23.326 rows=100062 loops=1)
SubPlan 1
-> Seq Scan on table_b (cost=0.00..477.27 rows=104 width=0) (actual time=1.430..1.430 rows=0 loops=33144)
Filter: ((tba_id = table_a.id) AND (status = 2))
Rows Removed by Filter: 17411
SubPlan 2
-> Seq Scan on table_b table_b_1 (cost=0.00..433.73 rows=5820 width=4) (never executed)
Filter: (status = 2)
Planning time: 0.902 ms
Execution time: 47632.565 ms
您当前的执行计划显示Postgres没有使用您定义的索引。相反,它只是对每个表执行两次顺序扫描,如果这些表很大,则不会特别有效。
首先,AFAIK您的查询将执行与此相同:
SELECT COUNT(id)
FROM table_a
WHERE
approval_status = 2 AND
is_locked = true AND
EXISTS (SELECT 1 from table_b WHERE table_b.tba_id = table_a.id AND table_b.status = 2)
GROUP BY company_id;
也就是说,Postgres过滤器实际上只是行为与该逻辑在正式的WHERE
子句中的行为相同。
我建议在两个表中的每一个上创建一个索引:
CREATE INDEX table_a_idx ON table_a (approval_status, is_locked, company_id);
CREATE INDEX table_b_idx ON table_b (status, tba_id);
table_a_idx
指数的原因是我们希望使用approval_status
和is_locked
滤波器消除尽可能多的记录。我还在此索引中包含了company_id
,以涵盖GROUP BY
列,希望避免在遍历索引后执行额外的磁盘读取。
table_b_idx
的存在是为了加快查询的EXISTS
子句。
我还建议您使用COUNT(*)
而不是COUNT(id)
。
尝试将一些过滤逻辑移动到连接中
SELECT
company_id
, COUNT(CASE
WHEN approval_status = 2 AND
is_locked = TRUE AND
b.tba_id IS NOT NULL
THEN id
END)
FROM table_a
LEFT JOIN (
SELECT DISTINCT tba_id
FROM table_b
) b on b.tba_id = table_a.id
GROUP BY
company_id