postgreSQL 中复杂的列依赖关系

Question

我有两列：column1 有两个不同的值 (0, 1)，column2 有 3 个不同的值 ('A', 'B', 'C')。对于列 2 中的“A”和“B”，列 1 中的值始终为 0，但如果列 2 中的值为“C”，则列 1 具有下一个分布：（0：15%，1：85%）。但我还有另一个列departmentid。在某些部门，如果column2 = 'C'，column1 的分布是(0: 0%, 1: 100%)。所以在某些情况下当我有疑问时

SELECT * FROM mytable WHERE departmenid = 42 AND column2 = 'C' AND column1 = 0 ORDER BY id LIMIT 10;

PostgreSQL 选择按 id 进行索引扫描，并假设将有约 25000 行，但在表中，没有包含此过滤器的行。所以查询通过索引扫描来扫描全表，耗时太长。如果 db 选择位图扫描，速度会快 50 倍（基于此表上的其他查询）。所有这些列上的索引都存在。

我有两个问题：

我在column1、column2和departmentid上创建了统计对象，但是没有两个对我来说有趣的依赖关系：从column2到column1以及从column2、departmenid到column1。为什么？在统计对象中，column1 和column2 依赖关系为空。我当然做了分析。
如何加快此查询速度？有没有办法在不创建特定的 4 列索引（departmenid、column2、column1、id）的情况下加速此查询？因为在实际生产查询中还有很多其他过滤器和不同的顺序（这个最小查询会重现问题）。

PostgreSQL 16.1，每天自动真空和分析，表中约 300 万行

更新： 表定义（超过100列，超过50个索引，仅显示相关）：

CREATE TABLE mytable
(
  id           SERIAL
    PRIMARY KEY,
  column1      INTEGER DEFAULT 0 NOT NULL,
  column2      TEXT              NOT NULL,
  departmentid INTEGER
);
CREATE INDEX mytable_departmentid_index
    ON mytable (departmentid);
CREATE INDEX mytable_column1_index
    ON mytable (column1);
CREATE INDEX mytable_column2_index
    ON mytable (column2);

-- tested statistic
CREATE STATISTICS mytable_column2_column1 ON column2, column1 FROM mytable;
CREATE STATISTICS mytable_departmentid_column1_column2 ON departmentid, column1, column2 FROM mytable;

departmentid - 有 < 0.0002% nulls, statistic says there are no nulls. Index on id exist due to id is primary key. All indexes are btree.

查询计划（其他列已删除，数字为实际值）：

Limit  (cost=0.68..501.79 rows=10 width=2690) (actual time=35351.049..47175.738 rows=1 loops=1)
  Output: id, column1, column2, departmentid
  Buffers: shared hit=1682274 read=1646793 dirtied=1640 written=980
  I/O Timings: shared/local read=39193.882 write=9.565
  WAL: records=1637 fpi=1637 bytes=3034199
  ->  Index Scan using mytable_pkey on public.mytable  (cost=0.68..1392081.41 rows=27780 width=2690) (actual time=35351.048..47175.735 rows=1 loops=1)
        Output: id, column1, column2, departmentid
        Filter: ((mytable.departmentid = 42) AND (mytable.column2 = 'C'::text) AND (mytable.column1 = 0))
        Rows Removed by Filter: 3536431
        Buffers: shared hit=1682274 read=1646793 dirtied=1640 written=980
        I/O Timings: shared/local read=39193.882 write=9.565
        WAL: records=1637 fpi=1637 bytes=3034199
Settings: effective_cache_size = '8008368kB', effective_io_concurrency = '0', geqo_effort = '10', jit = 'off', random_page_cost = '1.2', search_path = 'public'
Planning Time: 0.503 ms
Execution Time: 47175.783 ms

Answer 1

我认为没有一种优雅的方法可以解决这个问题。

修复此查询的一种强力方法是将其指定为

ORDER BY id+0

。这将迫使它不使用主键索引来提供排序，并且必须求助于您提到的位图扫描之类的东西。当然，可能存在类似的情况，使用主键提供排序实际上是一个好主意，并且在其中编写所有查询可以防止这种情况发生，即使您希望它发生。

另一种选择是只使用索引

(departmentid, id)

。这不会像 4 列索引那么好，但仍然应该比现有的查询计划好得多。我不明白您对 4 列索引的反对意见是什么，所以我不知道该反对意见是否也适用于这个两列索引。

postgreSQL 中复杂的列依赖关系

问题描述投票：0回答：1

1个回答

最新问题

postgreSQL 中复杂的列依赖关系

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1