优化分区表查询，无需在WHERE子句中添加分区键

Question

我们正在尝试优化对分区表的查询，查询看起来像这样：

SELECT col1, col2
FROM partitioned_table
WHERE profile_id = '00000000-0000-0000-0000-000000000000'
AND product_id = 'product_a'
ORDER BY created_at DESC
LIMIT 500;

父表/分区表定义：

CREATE TABLE public.partitioned_table (
    trade_id integer NOT NULL,
    product_id character varying NOT NULL,
    settled boolean DEFAULT false NOT NULL,
    user_id public.mongo_id NOT NULL,
    profile_id uuid NOT NULL,
    created_at timestamp with time zone NOT NULL
)
PARTITION BY RANGE (created_at);

用于扫描的索引：

CREATE INDEX partitioned_profile_id_product_id_trade_id_idx ON ONLY public.partitioned_table USING btree (profile_id, product_id, trade_id) INCLUDE (created_at);

定义在分区表上，分区本身是在索引添加到分区表后创建的，因此它们具有相同的索引集。

每个分区包含一天的数据，大约 1200 万行。
我们在 AWS RDS 上运行 Postgres 14.5。

这是查询计划：

                                                       QUERY PLAN
----------------------------------------------------------------------------------------------
 Limit  (cost=944.59..945.84 rows=500 width=202) (actual time=39.501..39.691 rows=500 loops=1)
   ->  Sort  (cost=944.59..947.09 rows=1000 width=202) (actual time=39.499..39.660 rows=500 loops=1)
         Sort Key: partitioned_table.created_at DESC
         Sort Method: top-N heapsort  Memory: 290kB
         ->  Append  (cost=0.71..894.76 rows=1000 width=202) (actual time=0.030..27.204 rows=32867 loops=1)
               ->  Index Scan using partitioned_table_profile_id_product_id_trade_id_idx on partitioned_table_legacy partitioned_table_1  (cost=0.71..772.99 rows=379 width=116) (actual time=0.029..22.550 rows=32838 loops=1)
                     Index Cond: ((profile_id = '00000000-0000-0000-0000-000000000000'::uuid) AND ((product_id)::text = 'product_a'::text))
               ->  Index Scan using partition_20240601_profile_id_product_id_trade_id_created_idx on partition_20240601 partitioned_table_2  (cost=0.56..12.65 rows=5 width=117) (actual time=0.019..0.019 rows=0 loops=1)
                     Index Cond: ((profile_id = '00000000-0000-0000-0000-000000000000'::uuid) AND ((product_id)::text = 'product_a'::text))
               ->  Index Scan using partition_20240602_profile_id_product_id_trade_id_created_idx on partition_20240602 partitioned_table_3  (cost=0.56..12.65 rows=5 width=117) (actual time=0.011..0.011 rows=0 loops=1)
                     Index Cond: ((profile_id = '00000000-0000-0000-0000-000000000000'::uuid) AND ((product_id)::text = 'product_a'::text))
               ->  Index Scan using partition_20240603_profile_id_product_id_trade_id_created_idx on partition_20240603 partitioned_table_4  (cost=0.56..18.68 rows=8 width=117) (actual time=0.014..0.017 rows=3 loops=1)
                     Index Cond: ((profile_id = '00000000-0000-0000-0000-000000000000'::uuid) AND ((product_id)::text = 'product_a'::text))
               ->  Index Scan using partition_20240604_profile_id_product_id_trade_id_created_idx on partition_20240604 partitioned_table_5  (cost=0.56..4.58 rows=1 width=117) (actual time=0.013..0.014 rows=2 loops=1)
                     Index Cond: ((profile_id = '00000000-0000-0000-0000-000000000000'::uuid) AND ((product_id)::text = 'product_a'::text))
               ->  Index Scan using partition_20240605_profile_id_product_id_trade_id_created_idx on partition_20240605 partitioned_table_6  (cost=0.56..16.66 rows=7 width=117) (actual time=0.020..0.021 rows=2 loops=1)
                     Index Cond: ((profile_id = '00000000-0000-0000-0000-000000000000'::uuid) AND ((product_id)::text = 'product_a'::text))
               ->  Index Scan using partition_20240606_profile_id_product_id_trade_id_created_idx on partition_20240606 partitioned_table_7  (cost=0.56..14.67 rows=6 width=117) (actual time=0.013..0.014 rows=1 loops=1)
                     Index Cond: ((profile_id = '00000000-0000-0000-0000-000000000000'::uuid) AND ((product_id)::text = 'product_a'::text))
               ->  Index Scan using partition_20240607_profile_id_product_id_trade_id_created_idx on partition_20240607 partitioned_table_8  (cost=0.56..36.90 rows=17 width=117) (actual time=0.015..0.037 rows=21 loops=1)
                     Index Cond: ((profile_id = '00000000-0000-0000-0000-000000000000'::uuid) AND ((product_id)::text = 'product_a'::text))
               ->  Seq Scan on partition_20240608 partitioned_table_9  (cost=0.00..0.00 rows=1 width=265) (actual time=0.014..0.015 rows=0 loops=1)
                     Filter: ((profile_id = '00000000-0000-0000-0000-000000000000'::uuid) AND ((product_id)::text = 'product_a'::text))
               ->  Seq Scan on partition_20240609 partitioned_table_10  (cost=0.00..0.00 rows=1 width=265) (actual time=0.004..0.004 rows=0 loops=1)
                     Filter: ((profile_id = '00000000-0000-0000-0000-000000000000'::uuid) AND ((product_id)::text = 'product_a'::text))
...

查询和查询计划被混淆，查询计划继续并对所有未来/空分区进行顺序扫描。

观察查询计划后，我有两个问题：

虽然我们指定了
```
ORDER BY created_at DESC
```
，但是查询计划仍然按照时间顺序向前扫描分区，既然是向后排序，那么可以颠倒过来吗？
我们积极创建了两年的未来分区，以降低运营成本。但是，由于此查询在
```
created_at
```
子句中没有分区列
```
WHERE
```
，因此即使在获取足够的记录之后，它也会扫描所有未来/空分区，基本上忽略
```
LIMIT
```
子句。如何让它在找到足够的记录时停止扫描？

我主要阅读文档，未能找到太多见解。

Answer 1

错误的查询计划

我们想要一个带有

Merge Append

的计划，并按分区键的（降序）顺序列出分区，并且一旦满足

LIMIT

，Postgres 就会停止扫描。就像这里：

Postgres：对范围分区表进行分区修剪

但我们实际上看到了

Append

→

Sort

→

Limit

Postgres 15的发行说明有这个有趣的项目：

允许对分区进行有序扫描，以避免在更多情况下进行排序（David Rowley）

以前，带有
DEFAULT
 分区或
LIST
 包含多个值的分区无法用于
有序分区扫描。现在如果这样的分区是可以使用它们的
在规划期间修剪。

（还有许多其他改进，因此升级到当前版本无论如何都会有帮助！）

确实，您似乎有这样一个默认分区：

-> 使用partitioned_table_profile_id_product_id_trade_id_idx进行索引扫描在

partitioned_table_legacypartitioned_table_1上
    （成本=0.71..772.99行=379宽度=116）
    （实际时间=0.029..22.550行=32838循环=1）

粗体强调我的。 您没有告诉我们，但偏离的表名称表明了这一点。

您仍然需要排除当前 Postgres 中的默认分区才能使其正常工作。（并手动合并它。）但是我们首先有一个解释

为什么它不起作用。

这里第二个感兴趣的项目是

错误的估计（不是核心问题）。 Postgres 预计有 379

 行，但发现了

32838

。众所周知，对组合滤波器的估计非常困难，但这仍然很糟糕。

您的专栏统计数据是最新的吗？运行
ANALYZE partitioned_table_legacy
```
 并再次测试。
```
增加
STATISTIC
```
目标
```
可能有帮助。参见：
- 如何查看ANALYZE使用的统计目标？
但是对于组合过滤器，您可能确实需要
扩展统计数据，例如：
CREATE STATISTICS (mcv) ON profile_id, product_id FROM public.partitioned_table;

优化分区表查询，无需在WHERE子句中添加分区键

问题描述投票：0回答：1

1个回答

错误的查询计划

最新问题

优化分区表查询，无需在WHERE子句中添加分区键

问题描述 投票：0回答：1

1个回答

错误的查询计划

最新问题

问题描述投票：0回答：1