我在 Postgres 中有这张表:
CREATE TABLE public.products
(
id character varying NOT NULL,
created_at timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL,
name character varying NOT NULL,
);
作为
id
我正在使用 ULID(请注意,Postgres 中的列类型是 varchar
,没有固定长度)。
我已准备好根据需要更改列类型。
SELECT "products".*
FROM "products"
WHERE "id" = ANY('{01HQDMCF0S7QWQYBP2FW9HK8DS,01HQDMCF11PB3N1Q9TXME6KW1B,01HQDMCF0YFZBAV5K4ZQ3495FR,01HQDMCF0S7QWQYBP2FW9HK8DS,01HQDMCF0WJ53MH0WKNR65CBX1,01HQDMCF0ZV7TYZWRR6RN5V4QT,01HQDMCF0YRF7CB3DSC6DSAY54,01HQDMCF0YFZBAV5K4ZQ3495FR,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF0S7QWQYBP2FW9HK8DS,01HQDMCF0S7QWQYBP2FW9HK8DS,01HQDMCF0YCJ8EFYCBKHWN4VQN,01HQDMCF0Z46AY42FC8FR953D8,01HQDMCF0S7QWQYBP2FW9HK8DS,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF0ZBQ3A70GE8K41RW1V,01HQDMCF0WNYW4ZH5G0M8MDAQA,01HQDMCF0ZM8B6BJJK38PH7M2F,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF0WT7K7W5GBFF3HVXHE,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF0W24YQHH4EKN91S0JY,01HQDMCF0WT7K7W5GBFF3HVXHE,01HQDMCF0ZM8B6BJJK38PH7M2F,01HQDMCF0YEQR7NJJJW88XNTW9,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF0YRF7CB3DSC6DSAY54,01HQDMCF0WBY0YT5MR53KG9HS8,01HQDMCF0WBY0YT5MR53KG9HS8,01HQDMCF106M1667NNPTDBKQDB,01HQDMCF0S7QWQYBP2FW9HK8DS,01HQDMCF0YFZBAV5K4ZQ3495FR,01HQDMCF0YFZBAV5K4ZQ3495FR,01HQDMCF0ZMPTQ0GHG0V87W0J4,01HQDMCF0ZM8B6BJJK38PH7M2F,01HQDMCF0ZM8B6BJJK38PH7M2F,01HQDMCF0WBY0YT5MR53KG9HS8,01HQDMCF0WHNJ0P9BGVEFJE2JG,01HQDMCF109V71KW4MTFWVRTFR,and a lot more ~ 20k of them}')
LIMIT 37520 OFFSET 0
这很慢。
重要
我在
id
子句中使用了很多 (~ 20k) ANY()
值。我认为这就是问题所在。
如果我使用
EXPLAIN ANALYZE
,它会说:
Limit (cost=85.51..90.82 rows=154 width=172) (actual time=1.765..1.799 rows=138 loops=1)
-> Seq Scan on products (cost=85.51..90.82 rows=154 width=172) (actual time=1.764..1.791 rows=138 loops=1)
Filter: ((id)::text = ANY ('{01HQDMCF0S7QWQYBP2FW9HK8DS,01HQDMCF11PB3N1Q9TXME6KW1B,01HQDMCF0YFZBAV5K4ZQ3495FR,01HQDMCF0S7QWQYBP2FW9HK8DS,01HQDMCF0WJ53MH0WKNR65CBX1,01HQDMCF0ZV7TYZWRR6RN5V4QT,01HQDMCF0YRF7CB3DSC6DSAY54,01HQDMCF0YFZBAV5K4ZQ3495FR,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF0S7QWQYBP2FW9HK8DS,01HQDMCF0S7QWQYBP2FW9HK8DS,01HQDMCF0YCJ8EFYCBKHWN4VQN,01HQDMCF0Z46AY42FC8FR953D8,01HQDMCF0S7QWQYBP2FW9HK8DS,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF0ZBQ3A70GE8K41RW1V,01HQDMCF0WNYW4ZH5G0M8MDAQA,01HQDMCF0ZM8B6BJJK38PH7M2F,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF0WT7K7W5GBFF3HVXHE,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF0W24YQHH4EKN91S0JY,01HQDMCF0WT7K7W5GBFF3HVXHE,01HQDMCF0ZM8B6BJJK38PH7M2F,01HQDMCF0YEQR7NJJJW88XNTW9,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF0YRF7CB3DSC6DSAY54,01HQDMCF0WBY0YT5MR53KG9HS8,01HQDMCF0WBY0YT5MR53KG9HS8,01HQDMCF106M1667NNPTDBKQDB,01HQDMCF0S7QWQYBP2FW9HK8DS,01HQDMCF0YFZBAV5K4ZQ3495FR,01HQDMCF0YFZBAV5K4ZQ3495FR,01HQDMCF0ZMPTQ0GHG0V87W0J4,01HQDMCF0ZM8B6BJJK38PH7M2F,01HQDMCF0ZM8B6BJJK38PH7M2F,01HQDMCF0WBY0YT5MR53KG9HS8,01HQDMCF0WHNJ0P9BGVEFJE2JG,01HQDMCF109V71KW4MTFWVRTFR,and a lot more ~ 20k of them}'::text[]))
Rows Removed by Filter: 16
Planning Time: 9.769 ms
Execution Time: 1.878 ms
我尝试过:
CREATE INDEX product_id_idx ON products(id);
和
CREATE INDEX product_id_idx ON products USING HASH(id);
和
CREATE INDEX product_id_pattern_idx ON products USING btree (id text_pattern_ops);
但它们并不能解决极度缓慢的问题。
我可以创建什么索引来改进查询?
=ANY(ARRAY[])
似乎不愿意使用您提供的索引,而是顺序扫描具有 200k 行的新 vacuum analyze
的表。 演示1:
SELECT "products".*
FROM "products"
WHERE "id" = ANY(ARRAY[(random())::text,(random())::text,...around 400 of these ...(random())::text])
LIMIT 37520 OFFSET 0
QUERY PLAN
Limit (cost=0.00..748971.00 rows=426 width=59) (actual time=17519.944..17519.952 rows=0 loops=1)
Output: id, created_at, name
-> Seq Scan on public.products (cost=0.00..748971.00 rows=426 width=59) (actual time=14365.282..14365.286 rows=0 loops=1)
Output: id, created_at, name
Filter: (products.id = ANY (ARRAY[(random())::text,(random())::text,...around 400 of these ...(random())::text]))
Rows Removed by Filter: 200000
Planning Time: 1.387 ms
JIT:
Functions: 4
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 10.140 ms, Inlining 81.166 ms, Optimization 1052.392 ms, Emission 2021.084 ms, Total 3164.782 ms
Execution Time: 17710.315 ms
IN (SELECT UNNEST(ARRAY[]))
,问题就解决了。 演示2:
SELECT "products".*
FROM "products"
WHERE "id" IN (SELECT UNNEST(ARRAY[(random())::text,(random())::text,...around 400 of these ...(random())::text]))
LIMIT 37520 OFFSET 0
QUERY PLAN
Limit (cost=6.41..1481.91 rows=426 width=59) (actual time=0.816..0.817 rows=0 loops=1)
Output: products.id, products.created_at, products.name
-> Nested Loop (cost=6.41..1481.91 rows=426 width=59) (actual time=0.815..0.816 rows=0 loops=1)
Output: products.id, products.created_at, products.name
-> HashAggregate (cost=6.41..8.41 rows=200 width=32) (actual time=0.198..0.251 rows=426 loops=1)
Output: (unnest(ARRAY[(random())::text, ...400 of these..., (random())::text]))
Group Key: unnest(ARRAY[(random())::text, ...400 of these..., (random())::text])
Batches: 1 Memory Usage: 77kB
-> ProjectSet (cost=0.00..5.34 rows=426 width=32) (actual time=0.069..0.102 rows=426 loops=1)
Output: unnest(ARRAY[(random())::text, ...400 of these..., (random())::text])
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
-> Index Scan using product_id_idx2 on public.products (cost=0.00..7.36 rows=1 width=59) (actual time=0.001..0.001 rows=0 loops=426)
Output: products.id, products.created_at, products.name
Index Cond: (products.id = (unnest(ARRAY[(random())::text, ...400 of these..., (random())::text])))
Planning Time: 1.050 ms
Execution Time: 0.989 ms
现在使用索引,它从
17s
下降到 1ms
。