为什么 PostgreSQL 对看似已经排序的结果集进行排序?

问题描述 投票:0回答:1

我正在尝试优化以下学校作业查询:

SELECT
    DATE(b.book_date),
    SUM(b.total_amount) revenue,
    COUNT(DISTINCT(t.passenger_id)) count_passengers
FROM bookings b
JOIN tickets t ON t.book_ref = b.book_ref
GROUP BY
    DATE(b.book_date)
ORDER BY
    COUNT(DISTINCT(t.passenger_id)) DESC,
    SUM(b.total_amount) DESC;

在不创建任何索引的情况下,调度程序会为我提供以下解释计划:

EXPLAIN ANALYZE

|QUERY PLAN                                                                                                                                                          |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|Sort  (cost=20001452121.83..20001453243.21 rows=448552 width=44) (actual time=10862.729..10862.752 rows=392 loops=1)                                                |
|  Sort Key: (count(DISTINCT t.passenger_id)) DESC, (sum(b.total_amount)) DESC                                                                                       |
|  Sort Method: quicksort  Memory: 55kB                                                                                                                              |
|  ->  GroupAggregate  (cost=20001359986.85..20001396213.70 rows=448552 width=44) (actual time=8131.457..10862.442 rows=392 loops=1)                                 |
|        Group Key: (date(b.book_date))                                                                                                                              |
|        ->  Sort  (cost=20001359986.85..20001367361.50 rows=2949857 width=22) (actual time=8131.394..8329.844 rows=2949857 loops=1)                                 |
|              Sort Key: (date(b.book_date))                                                                                                                         |
|              Sort Method: external merge  Disk: 95136kB                                                                                                            |
|              ->  Merge Join  (cost=20000859819.02..20000921997.07 rows=2949857 width=22) (actual time=6064.572..7486.794 rows=2949857 loops=1)                     |
|                    Merge Cond: (b.book_ref = t.book_ref)                                                                                                           |
|                    ->  Sort  (cost=10000342915.67..10000348193.45 rows=2111110 width=21) (actual time=854.300..1138.458 rows=2111110 loops=1)                      |
|                          Sort Key: b.book_ref                                                                                                                      |
|                          Sort Method: external merge  Disk: 66024kB                                                                                                |
|                          ->  Seq Scan on bookings b  (cost=10000000000.00..10000034558.10 rows=2111110 width=21) (actual time=90.339..215.696 rows=2111110 loops=1)|
|                    ->  Sort  (cost=10000516903.35..10000524278.00 rows=2949857 width=19) (actual time=5210.197..5407.427 rows=2949857 loops=1)                     |
|                          Sort Key: t.book_ref                                                                                                                      |
|                          Sort Method: external sort  Disk: 95320kB                                                                                                 |
|                          ->  Seq Scan on tickets t  (cost=10000000000.00..10000078913.57 rows=2949857 width=19) (actual time=0.134..241.174 rows=2949857 loops=1)  |
|Planning Time: 0.121 ms                                                                                                                                             |
|JIT:                                                                                                                                                                |
|  Functions: 16                                                                                                                                                     |
|  Options: Inlining true, Optimization true, Expressions true, Deforming true                                                                                       |
|  Timing: Generation 0.834 ms, Inlining 7.090 ms, Optimization 51.113 ms, Emission 32.119 ms, Total 91.156 ms                                                       |
|Execution Time: 10890.239 ms                                                                                                                                        |

为了优化连接操作,我创建了以下索引:

CREATE INDEX idx_bookings_bd_ta_bref ON bookings USING btree(book_date, total_amount, book_ref);
CREATE INDEX idx_tickets_bref ON tickets USING hash(book_ref);

主要思想是,调度程序可以通过迭代 Bookings 上的索引来执行 Join 操作,并通过 has 索引从每个 bookings 行的 Ticket 中获取必要的行,使得 Join 后的结果集已经按 book_date 排序完成,因此消除了对结果集的排序操作。禁用 seqscan 和一些索引后,我终于可以让调度程序执行我想要的操作,这会导致以下

EXPLAIN ANALYZE
输出

|QUERY PLAN                                                                                                                                                                        |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|Sort  (cost=966377.54..967498.92 rows=448552 width=44) (actual time=6961.514..6961.535 rows=392 loops=1)                                                                          |
|  Sort Key: (count(DISTINCT t.passenger_id)) DESC, (sum(b.total_amount)) DESC                                                                                                     |
|  Sort Method: quicksort  Memory: 55kB                                                                                                                                            |
|  ->  GroupAggregate  (cost=874242.56..910469.41 rows=448552 width=44) (actual time=4283.854..6961.216 rows=392 loops=1)                                                          |
|        Group Key: (date(b.book_date))                                                                                                                                            |
|        ->  Sort  (cost=874242.56..881617.20 rows=2949857 width=22) (actual time=4283.792..4475.416 rows=2949857 loops=1)                                                         |
|              Sort Key: (date(b.book_date))                                                                                                                                       |
|              Sort Method: external merge  Disk: 95136kB                                                                                                                          |
|              ->  Nested Loop  (cost=0.43..436252.77 rows=2949857 width=22) (actual time=64.342..3727.674 rows=2949857 loops=1)                                                   |
|                    ->  Index Only Scan using idx_bookings_bd_ta_bref on bookings b  (cost=0.43..73467.08 rows=2111110 width=21) (actual time=0.049..172.735 rows=2111110 loops=1)|
|                          Heap Fetches: 0                                                                                                                                         |
|                    ->  Index Scan using idx_tickets_bref on tickets t  (cost=0.00..0.15 rows=2 width=19) (actual time=0.001..0.001 rows=1 loops=2111110)                         |
|                          Index Cond: (book_ref = b.book_ref)                                                                                                                     |
|                          Rows Removed by Index Recheck: 0                                                                                                                        |
|Planning Time: 0.139 ms                                                                                                                                                           |
|JIT:                                                                                                                                                                              |
|  Functions: 10                                                                                                                                                                   |
|  Options: Inlining true, Optimization true, Expressions true, Deforming true                                                                                                     |
|  Timing: Generation 0.448 ms, Inlining 6.415 ms, Optimization 34.103 ms, Emission 23.769 ms, Total 64.734 ms                                                                     |
|Execution Time: 6971.385 ms                                                                                                                                                       |

令我困惑的是,尽管结果集已经按 book_date 排序,但调度程序坚持无论如何都要执行外部磁盘排序操作,这大大减慢了速度。为什么 PostgreSQL 对应该已经排序的东西进行排序?我怎样才能阻止它这样做?请注意,我不需要问题的答案,我只想知道为什么调度程序正在做一些它实际上不应该做的事情。

如果相关的话,我正在使用 PostgreSQL 版本 14,导致问题的 book_date 列是 timestampz 列。

database postgresql query-optimization relational-database database-indexes
1个回答
0
投票

排序是必需的,因为查询按

DATE(b.book_date)
进行分组,但该表达式不是
bookings
上索引的主键。查询规划器确定索引扫描比表扫描更有效,但它无法使用键
b.book_date
作为按
DATE(b.book_date)
排序的替代项。将
book_date
的类型从
TIMESTAMPTZ
更改为
DATE
,然后按
b.book_date
分组可能会消除额外的排序。通过将
book_date
替换为
DATE(book_date)
作为主键,将索引更改为功能索引也有可能达到预期的结果。

© www.soinside.com 2019 - 2024. All rights reserved.