使用 Postgres EXPLAIN ANALYZE 时我更相信“成本”还是“实际时间”？

Question

我正在尝试提高我们最昂贵的查询之一的性能。我在沙箱 postgres 15 数据库上运行 EXPLAIN ANALYZE，该数据库的数据集比生产 postgres 15 数据库小得多，而且也可能不具有与生产中相同的数据模式。

SELECT COUNT(ut.id) 
FROM user_task ut
LEFT JOIN person person ON person.id = ut.person_id
LEFT JOIN health_center_group health_center_group ON health_center_group.id = ut.health_center_group_id 
WHERE ut.test_group = ('')
AND ut.status IN (('active')::user_task_status)
AND ut.type IN (('review_consent_form')::user_task_type)
AND ut.assignee_id IS NULL
AND (
  person IS NULL 
  OR person.health_center_permission_pending NOT IN (('exception'), ('multiple_pending'))
)
AND (
  NOT (
    ut.type = 'review_consent_form' 
    AND ut.status IN ('active', 'exception')
    AND health_center_group.persons_imported_at_tg IS NULL
  )
)

 Aggregate  (cost=2591.66..2591.67 rows=1 width=8) (actual time=4.003..4.005 rows=1 loops=1)
   ->  Nested Loop Left Join  (cost=83.67..2589.84 rows=727 width=8) (actual time=0.730..3.980 rows=251 loops=1)
         Filter: ((person.* IS NULL) OR ((person.health_center_permission_pending)::text <> ALL ('{exception,multiple_pending}'::text[])))
         Rows Removed by Filter: 4
         ->  Hash Left Join  (cost=83.39..480.50 rows=728 width=16) (actual time=0.582..3.710 rows=255 loops=1)
               Hash Cond: (ut.health_center_group_id = health_center_group.id)
               Filter: ((ut.type <> 'review_consent_form'::user_task_type) OR (ut.status <> ALL ('{active,exception}'::user_task_status[])) OR (health_center_group.persons_imported_at_tg IS NOT NULL))
               Rows Removed by Filter: 24
               ->  Bitmap Heap Scan on user_task ut  (cost=55.68..450.84 rows=738 width=32) (actual time=0.324..3.353 rows=279 loops=1)
                     Recheck Cond: (type = 'review_consent_form'::user_task_type)
                     Filter: ((assignee_id IS NULL) AND ((test_group)::text = ''::text) AND (status = 'active'::user_task_status))
                     Rows Removed by Filter: 4416
                     Heap Blocks: exact=306
                     ->  Bitmap Index Scan on user_task_type_idx  (cost=0.00..55.50 rows=4695 width=0) (actual time=0.211..0.211 rows=4735 loops=1)
                           Index Cond: (type = 'review_consent_form'::user_task_type)
               ->  Hash  (cost=23.98..23.98 rows=298 width=16) (actual time=0.238..0.238 rows=298 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 22kB
                     ->  Seq Scan on health_center_group  (cost=0.00..23.98 rows=298 width=16) (actual time=0.011..0.152 rows=298 loops=1)
         ->  Index Scan using person_pkey on person  (cost=0.29..2.88 rows=1 width=1000) (actual time=0.001..0.001 rows=0 loops=255)
               Index Cond: (id = ut.person_id)
 Planning Time: 1.524 ms
 Execution Time: 4.105 ms
(22 rows)

我发现将查询的第 11 行从

person IS NULL

更改为

ut.person_id IS NULL

显着降低了成本，但也显着增加了“实际时间”和“执行时间”：

Aggregate  (cost=1551.69..1551.70 rows=1 width=8) (actual time=15.309..15.311 rows=1 loops=1)
   ->  Hash Left Join  (cost=1150.85..1549.88 rows=727 width=8) (actual time=13.147..15.291 rows=251 loops=1)
         Hash Cond: (ut.person_id = person.id)
         Filter: ((ut.person_id IS NULL) OR ((person.health_center_permission_pending)::text <> ALL ('{exception,multiple_pending}'::text[])))
         Rows Removed by Filter: 4
         ->  Hash Left Join  (cost=83.39..480.50 rows=728 width=16) (actual time=0.932..3.174 rows=255 loops=1)
               Hash Cond: (ut.health_center_group_id = health_center_group.id)
               Filter: ((ut.type <> 'review_consent_form'::user_task_type) OR (ut.status <> ALL ('{active,exception}'::user_task_status[])) OR (health_center_group.persons_imported_at_tg IS NOT NULL))
               Rows Removed by Filter: 24
               ->  Bitmap Heap Scan on user_task ut  (cost=55.68..450.84 rows=738 width=32) (actual time=0.468..2.637 rows=279 loops=1)
                     Recheck Cond: (type = 'review_consent_form'::user_task_type)
                     Filter: ((assignee_id IS NULL) AND ((test_group)::text = ''::text) AND (status = 'active'::user_task_status))
                     Rows Removed by Filter: 4416
                     Heap Blocks: exact=306
                     ->  Bitmap Index Scan on user_task_type_idx  (cost=0.00..55.50 rows=4695 width=0) (actual time=0.267..0.267 rows=4735 loops=1)
                           Index Cond: (type = 'review_consent_form'::user_task_type)
               ->  Hash  (cost=23.98..23.98 rows=298 width=16) (actual time=0.445..0.446 rows=298 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 22kB
                     ->  Seq Scan on health_center_group  (cost=0.00..23.98 rows=298 width=16) (actual time=0.019..0.381 rows=298 loops=1)
         ->  Hash  (cost=711.65..711.65 rows=28465 width=13) (actual time=11.988..11.989 rows=28465 loops=1)
               Buckets: 32768  Batches: 1  Memory Usage: 1508kB
               ->  Seq Scan on person  (cost=0.00..711.65 rows=28465 width=13) (actual time=0.014..7.222 rows=28465 loops=1)
 Planning Time: 13.285 ms
 Execution Time: 15.512 ms
(24 rows)

成本从 2591.66..2591.67 显着降低至 1551.69..1551.70 。然而，“实际时间”从 4.003..4.005 增加到 15.309..15.311，“执行时间”从 4.105 ms 增加到 15.512 ms。

我知道“成本”只是一个估计（是的，我在进行分析之前确实在表上运行了分析），但是我不知道在对沙箱数据库进行分析时如何权衡这些指标。此更改会导致性能提高还是下降？

Answer 1

实际执行时间和实际行数是真实的，因此这是有关实际发生情况的可靠信息。

包含成本和行估计的第一个括号是数据库

认为的情况。如果这与实际情况不同，则意味着数据库的估计是错误的，这可能是导致计划选择错误的原因。但是，就你的情况很难得出可靠的结论。行数如此之低，执行时间如此之短，以至于差异也可能是由于随机变化造成的。重复实验几次，看看执行时间有何变化。为了获得更显着的结果，您需要更多数据。

使用 Postgres EXPLAIN ANALYZE 时我更相信“成本”还是“实际时间”？

问题描述投票：0回答：1

1个回答

最新问题

使用 Postgres EXPLAIN ANALYZE 时我更相信“成本”还是“实际时间”？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1