我正在尝试提高我们最昂贵的查询之一的性能。我在沙箱 postgres 15 数据库上运行 EXPLAIN ANALYZE,该数据库的数据集比生产 postgres 15 数据库小得多,而且也可能不具有与生产中相同的数据模式。
SELECT COUNT(ut.id)
FROM user_task ut
LEFT JOIN person person ON person.id = ut.person_id
LEFT JOIN health_center_group health_center_group ON health_center_group.id = ut.health_center_group_id
WHERE ut.test_group = ('')
AND ut.status IN (('active')::user_task_status)
AND ut.type IN (('review_consent_form')::user_task_type)
AND ut.assignee_id IS NULL
AND (
person IS NULL
OR person.health_center_permission_pending NOT IN (('exception'), ('multiple_pending'))
)
AND (
NOT (
ut.type = 'review_consent_form'
AND ut.status IN ('active', 'exception')
AND health_center_group.persons_imported_at_tg IS NULL
)
)
Aggregate (cost=2591.66..2591.67 rows=1 width=8) (actual time=4.003..4.005 rows=1 loops=1)
-> Nested Loop Left Join (cost=83.67..2589.84 rows=727 width=8) (actual time=0.730..3.980 rows=251 loops=1)
Filter: ((person.* IS NULL) OR ((person.health_center_permission_pending)::text <> ALL ('{exception,multiple_pending}'::text[])))
Rows Removed by Filter: 4
-> Hash Left Join (cost=83.39..480.50 rows=728 width=16) (actual time=0.582..3.710 rows=255 loops=1)
Hash Cond: (ut.health_center_group_id = health_center_group.id)
Filter: ((ut.type <> 'review_consent_form'::user_task_type) OR (ut.status <> ALL ('{active,exception}'::user_task_status[])) OR (health_center_group.persons_imported_at_tg IS NOT NULL))
Rows Removed by Filter: 24
-> Bitmap Heap Scan on user_task ut (cost=55.68..450.84 rows=738 width=32) (actual time=0.324..3.353 rows=279 loops=1)
Recheck Cond: (type = 'review_consent_form'::user_task_type)
Filter: ((assignee_id IS NULL) AND ((test_group)::text = ''::text) AND (status = 'active'::user_task_status))
Rows Removed by Filter: 4416
Heap Blocks: exact=306
-> Bitmap Index Scan on user_task_type_idx (cost=0.00..55.50 rows=4695 width=0) (actual time=0.211..0.211 rows=4735 loops=1)
Index Cond: (type = 'review_consent_form'::user_task_type)
-> Hash (cost=23.98..23.98 rows=298 width=16) (actual time=0.238..0.238 rows=298 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 22kB
-> Seq Scan on health_center_group (cost=0.00..23.98 rows=298 width=16) (actual time=0.011..0.152 rows=298 loops=1)
-> Index Scan using person_pkey on person (cost=0.29..2.88 rows=1 width=1000) (actual time=0.001..0.001 rows=0 loops=255)
Index Cond: (id = ut.person_id)
Planning Time: 1.524 ms
Execution Time: 4.105 ms
(22 rows)
我发现将查询的第 11 行从
person IS NULL
更改为 ut.person_id IS NULL
显着降低了成本,但也显着增加了“实际时间”和“执行时间”:
Aggregate (cost=1551.69..1551.70 rows=1 width=8) (actual time=15.309..15.311 rows=1 loops=1)
-> Hash Left Join (cost=1150.85..1549.88 rows=727 width=8) (actual time=13.147..15.291 rows=251 loops=1)
Hash Cond: (ut.person_id = person.id)
Filter: ((ut.person_id IS NULL) OR ((person.health_center_permission_pending)::text <> ALL ('{exception,multiple_pending}'::text[])))
Rows Removed by Filter: 4
-> Hash Left Join (cost=83.39..480.50 rows=728 width=16) (actual time=0.932..3.174 rows=255 loops=1)
Hash Cond: (ut.health_center_group_id = health_center_group.id)
Filter: ((ut.type <> 'review_consent_form'::user_task_type) OR (ut.status <> ALL ('{active,exception}'::user_task_status[])) OR (health_center_group.persons_imported_at_tg IS NOT NULL))
Rows Removed by Filter: 24
-> Bitmap Heap Scan on user_task ut (cost=55.68..450.84 rows=738 width=32) (actual time=0.468..2.637 rows=279 loops=1)
Recheck Cond: (type = 'review_consent_form'::user_task_type)
Filter: ((assignee_id IS NULL) AND ((test_group)::text = ''::text) AND (status = 'active'::user_task_status))
Rows Removed by Filter: 4416
Heap Blocks: exact=306
-> Bitmap Index Scan on user_task_type_idx (cost=0.00..55.50 rows=4695 width=0) (actual time=0.267..0.267 rows=4735 loops=1)
Index Cond: (type = 'review_consent_form'::user_task_type)
-> Hash (cost=23.98..23.98 rows=298 width=16) (actual time=0.445..0.446 rows=298 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 22kB
-> Seq Scan on health_center_group (cost=0.00..23.98 rows=298 width=16) (actual time=0.019..0.381 rows=298 loops=1)
-> Hash (cost=711.65..711.65 rows=28465 width=13) (actual time=11.988..11.989 rows=28465 loops=1)
Buckets: 32768 Batches: 1 Memory Usage: 1508kB
-> Seq Scan on person (cost=0.00..711.65 rows=28465 width=13) (actual time=0.014..7.222 rows=28465 loops=1)
Planning Time: 13.285 ms
Execution Time: 15.512 ms
(24 rows)
成本从 2591.66..2591.67 显着降低至 1551.69..1551.70 。然而,“实际时间”从 4.003..4.005 增加到 15.309..15.311,“执行时间”从 4.105 ms 增加到 15.512 ms。
我知道“成本”只是一个估计(是的,我在进行分析之前确实在表上运行了分析),但是我不知道在对沙箱数据库进行分析时如何权衡这些指标。此更改会导致性能提高还是下降?
包含成本和行估计的第一个括号是数据库
认为的情况。如果这与实际情况不同,则意味着数据库的估计是错误的,这可能是导致计划选择错误的原因。 但是,就你的情况很难得出可靠的结论。行数如此之低,执行时间如此之短,以至于差异也可能是由于随机变化造成的。 重复实验几次,看看执行时间有何变化。 为了获得更显着的结果,您需要更多数据。