这个查询很慢,因为我认为我在 ANY() 子句中使用了很多 id 值,不是吗?

问题描述 投票:0回答:1

我在 Postgres 中有这张表:

CREATE TABLE public.products 
(
    id character varying NOT NULL,
    created_at timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL,
    name character varying NOT NULL,
);

作为

id
我正在使用 ULID(请注意,Postgres 中的列类型是
varchar
,没有固定长度)。

我已准备好根据需要更改列类型。

SELECT "products".* 
FROM "products" 
WHERE "id" = ANY('{01HQDMCF0S7QWQYBP2FW9HK8DS,01HQDMCF11PB3N1Q9TXME6KW1B,01HQDMCF0YFZBAV5K4ZQ3495FR,01HQDMCF0S7QWQYBP2FW9HK8DS,01HQDMCF0WJ53MH0WKNR65CBX1,01HQDMCF0ZV7TYZWRR6RN5V4QT,01HQDMCF0YRF7CB3DSC6DSAY54,01HQDMCF0YFZBAV5K4ZQ3495FR,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF0S7QWQYBP2FW9HK8DS,01HQDMCF0S7QWQYBP2FW9HK8DS,01HQDMCF0YCJ8EFYCBKHWN4VQN,01HQDMCF0Z46AY42FC8FR953D8,01HQDMCF0S7QWQYBP2FW9HK8DS,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF0ZBQ3A70GE8K41RW1V,01HQDMCF0WNYW4ZH5G0M8MDAQA,01HQDMCF0ZM8B6BJJK38PH7M2F,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF0WT7K7W5GBFF3HVXHE,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF0W24YQHH4EKN91S0JY,01HQDMCF0WT7K7W5GBFF3HVXHE,01HQDMCF0ZM8B6BJJK38PH7M2F,01HQDMCF0YEQR7NJJJW88XNTW9,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF0YRF7CB3DSC6DSAY54,01HQDMCF0WBY0YT5MR53KG9HS8,01HQDMCF0WBY0YT5MR53KG9HS8,01HQDMCF106M1667NNPTDBKQDB,01HQDMCF0S7QWQYBP2FW9HK8DS,01HQDMCF0YFZBAV5K4ZQ3495FR,01HQDMCF0YFZBAV5K4ZQ3495FR,01HQDMCF0ZMPTQ0GHG0V87W0J4,01HQDMCF0ZM8B6BJJK38PH7M2F,01HQDMCF0ZM8B6BJJK38PH7M2F,01HQDMCF0WBY0YT5MR53KG9HS8,01HQDMCF0WHNJ0P9BGVEFJE2JG,01HQDMCF109V71KW4MTFWVRTFR,and a lot more ~ 20k of them}') 
LIMIT 37520 OFFSET 0

这很慢。

重要

我在

id
子句中使用了很多 (~ 20k)
ANY()
值。我认为这就是问题所在。

如果我使用

EXPLAIN ANALYZE
,它会说:

Limit  (cost=85.51..90.82 rows=154 width=172) (actual time=1.765..1.799 rows=138 loops=1)
  ->  Seq Scan on products  (cost=85.51..90.82 rows=154 width=172) (actual time=1.764..1.791 rows=138 loops=1)
        Filter: ((id)::text = ANY ('{01HQDMCF0S7QWQYBP2FW9HK8DS,01HQDMCF11PB3N1Q9TXME6KW1B,01HQDMCF0YFZBAV5K4ZQ3495FR,01HQDMCF0S7QWQYBP2FW9HK8DS,01HQDMCF0WJ53MH0WKNR65CBX1,01HQDMCF0ZV7TYZWRR6RN5V4QT,01HQDMCF0YRF7CB3DSC6DSAY54,01HQDMCF0YFZBAV5K4ZQ3495FR,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF0S7QWQYBP2FW9HK8DS,01HQDMCF0S7QWQYBP2FW9HK8DS,01HQDMCF0YCJ8EFYCBKHWN4VQN,01HQDMCF0Z46AY42FC8FR953D8,01HQDMCF0S7QWQYBP2FW9HK8DS,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF0ZBQ3A70GE8K41RW1V,01HQDMCF0WNYW4ZH5G0M8MDAQA,01HQDMCF0ZM8B6BJJK38PH7M2F,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF0WT7K7W5GBFF3HVXHE,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF0W24YQHH4EKN91S0JY,01HQDMCF0WT7K7W5GBFF3HVXHE,01HQDMCF0ZM8B6BJJK38PH7M2F,01HQDMCF0YEQR7NJJJW88XNTW9,01HQDMCF11Z9VKEQXNZKEWXRDN,01HQDMCF0YRF7CB3DSC6DSAY54,01HQDMCF0WBY0YT5MR53KG9HS8,01HQDMCF0WBY0YT5MR53KG9HS8,01HQDMCF106M1667NNPTDBKQDB,01HQDMCF0S7QWQYBP2FW9HK8DS,01HQDMCF0YFZBAV5K4ZQ3495FR,01HQDMCF0YFZBAV5K4ZQ3495FR,01HQDMCF0ZMPTQ0GHG0V87W0J4,01HQDMCF0ZM8B6BJJK38PH7M2F,01HQDMCF0ZM8B6BJJK38PH7M2F,01HQDMCF0WBY0YT5MR53KG9HS8,01HQDMCF0WHNJ0P9BGVEFJE2JG,01HQDMCF109V71KW4MTFWVRTFR,and a lot more ~ 20k of them}'::text[]))
        Rows Removed by Filter: 16
Planning Time: 9.769 ms
Execution Time: 1.878 ms

我尝试过:

CREATE INDEX product_id_idx ON products(id);

CREATE INDEX product_id_idx ON products USING HASH(id);

CREATE INDEX product_id_pattern_idx ON products USING btree (id text_pattern_ops);

但它们并不能解决极度缓慢的问题。

我可以创建什么索引来改进查询?

sql postgresql query-optimization database-performance database-indexes
1个回答
0
投票

我没有一路达到 20k,但是

=ANY(ARRAY[])
似乎不愿意使用您提供的索引,而是顺序扫描具有 200k 行的新
vacuum analyze
的表。 演示1:

SELECT "products".* 
FROM "products" 
WHERE "id" = ANY(ARRAY[(random())::text,(random())::text,...around 400 of these ...(random())::text]) 
LIMIT 37520 OFFSET 0
QUERY PLAN
Limit  (cost=0.00..748971.00 rows=426 width=59) (actual time=17519.944..17519.952 rows=0 loops=1)
  Output: id, created_at, name
  ->  Seq Scan on public.products  (cost=0.00..748971.00 rows=426 width=59) (actual time=14365.282..14365.286 rows=0 loops=1)
        Output: id, created_at, name
        Filter: (products.id = ANY (ARRAY[(random())::text,(random())::text,...around 400 of these ...(random())::text]))
        Rows Removed by Filter: 200000
Planning Time: 1.387 ms
JIT:
  Functions: 4
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 10.140 ms, Inlining 81.166 ms, Optimization 1052.392 ms, Emission 2021.084 ms, Total 3164.782 ms
Execution Time: 17710.315 ms

如果您将其换成等值的

IN (SELECT UNNEST(ARRAY[]))
,问题就解决了。 演示2:

SELECT "products".* 
FROM "products" 
WHERE "id" IN (SELECT UNNEST(ARRAY[(random())::text,(random())::text,...around 400 of these ...(random())::text])) 
LIMIT 37520 OFFSET 0
QUERY PLAN
Limit  (cost=6.41..1481.91 rows=426 width=59) (actual time=0.816..0.817 rows=0 loops=1)
  Output: products.id, products.created_at, products.name
  ->  Nested Loop  (cost=6.41..1481.91 rows=426 width=59) (actual time=0.815..0.816 rows=0 loops=1)
        Output: products.id, products.created_at, products.name
        ->  HashAggregate  (cost=6.41..8.41 rows=200 width=32) (actual time=0.198..0.251 rows=426 loops=1)
              Output: (unnest(ARRAY[(random())::text, ...400 of these..., (random())::text]))
              Group Key: unnest(ARRAY[(random())::text, ...400 of these..., (random())::text])
              Batches: 1  Memory Usage: 77kB
              ->  ProjectSet  (cost=0.00..5.34 rows=426 width=32) (actual time=0.069..0.102 rows=426 loops=1)
                    Output: unnest(ARRAY[(random())::text, ...400 of these..., (random())::text])
                    ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
        ->  Index Scan using product_id_idx2 on public.products  (cost=0.00..7.36 rows=1 width=59) (actual time=0.001..0.001 rows=0 loops=426)
              Output: products.id, products.created_at, products.name
              Index Cond: (products.id = (unnest(ARRAY[(random())::text, ...400 of these..., (random())::text])))
Planning Time: 1.050 ms
Execution Time: 0.989 ms

现在使用索引,它从

17s
下降到
1ms

© www.soinside.com 2019 - 2024. All rights reserved.