如何让 Postgres 使用我的功能索引

Question

我有下表：

CREATE TABLE items
(
    id NUMERIC(20, 0) NOT NULL DEFAULT NEXTVAL('items_sequence') PRIMARY KEY,
    item_price NUMERIC(19, 2) DEFAULT NULL NULL,
    status NUMERIC(2, 0) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL
);

具有以下索引：

CREATE INDEX items_dash_idx ON items (status, DATE(created_at));

我想按状态和每天对我的项目进行分组，持续约 30 天。这意味着我想要获取过去 30 天内每个状态每天的计数和总商品价格，包括计数/金额为 0 的情况。我有 5 个状态，其中之一 (50) 不相关并且行数太多（例如，过去 30 天，状态 50 有约 400k 行，而状态 10、20、30 和 40 有约 1k）。

我有以下疑问：

SELECT COUNT(i.id)                    AS count,
       COALESCE(SUM(i.item_price), 0) AS amount,
       dates_table.status,
       dates_table.created_at
FROM (SELECT created_at::DATE AS created_at, 10 AS status
      FROM GENERATE_SERIES('2024-09-18'::DATE, '2024-10-18'::DATE, INTERVAL '1 DAY') AS created_at
      UNION
      SELECT created_at::DATE AS created_at, 20 AS status
      FROM GENERATE_SERIES('2024-09-18'::DATE, '2024-10-18'::DATE, INTERVAL '1 DAY') AS created_at
      UNION
      SELECT created_at::DATE AS created_at, 30 AS status
      FROM GENERATE_SERIES('2024-09-18'::DATE, '2024-10-18'::DATE, INTERVAL '1 DAY') AS created_at
      UNION
      SELECT created_at::DATE AS created_at, 40 AS status
      FROM GENERATE_SERIES('2024-09-18'::DATE, '2024-10-18'::DATE, INTERVAL '1 DAY') AS created_at
     ) AS dates_table
LEFT JOIN items i 
       ON i.status = dates_table.status
      AND DATE(i.created_at) = dates_table.created_at
GROUP BY dates_table.created_at, dates_table.status
ORDER BY dates_table.created_at, dates_table.status;

这个查询似乎需要 10 多秒，输出如下

EXPLAIN (ANALYZE, BUFFERS)

：

QUERY PLAN
Sort  (cost=2242005.05..2242006.05 rows=400 width=48) (actual time=21950.589..21950.601 rows=72 loops=1)
  Sort Key: dates_table.created_at, dates_table.status
  Sort Method: quicksort  Memory: 29kB
  Buffers: shared hit=676950 read=747852 dirtied=755, temp read=28515 written=28531
  ->  HashAggregate  (cost=2241982.76..2241987.76 rows=400 width=48) (actual time=21950.436..21950.492 rows=72 loops=1)
        Group Key: dates_table.created_at, dates_table.status
        Batches: 1  Memory Usage: 61kB
        Buffers: shared hit=676947 read=747852 dirtied=755, temp read=28515 written=28531
        ->  Merge Left Join  (cost=2161026.21..2239512.33 rows=247043 width=20) (actual time=21834.112..21948.382 rows=11066 loops=1)
              Merge Cond: ((dates_table.created_at = (date(i.created_at))) AND (((dates_table.status)::numeric) = i.status))
              Buffers: shared hit=676947 read=747852 dirtied=755, temp read=28515 written=28531
              ->  Sort  (cost=449.35..459.35 rows=4000 width=8) (actual time=895.905..895.933 rows=72 loops=1)
                    Sort Key: dates_table.created_at, ((dates_table.status)::numeric)
                    Sort Method: quicksort  Memory: 28kB
                    Buffers: shared hit=4
                    ->  Subquery Scan on dates_table  (cost=130.03..210.03 rows=4000 width=8) (actual time=895.792..895.846 rows=72 loops=1)
                          ->  HashAggregate  (cost=130.03..170.03 rows=4000 width=8) (actual time=895.788..895.831 rows=72 loops=1)
                                Group Key: ((created_ai.created_at)::date), (10)
                                Batches: 1  Memory Usage: 217kB
                                ->  Append  (cost=0.01..110.03 rows=4000 width=8) (actual time=895.697..895.749 rows=72 loops=1)
                                      ->  Function Scan on generate_series created_at  (cost=0.01..12.51 rows=1000 width=8) (actual time=895.694..895.697 rows=18 loops=1)
                                      ->  Function Scan on generate_series created_at_1  (cost=0.01..12.51 rows=1000 width=8) (actual time=0.012..0.014 rows=18 loops=1)
                                      ->  Function Scan on generate_series created_at_2  (cost=0.01..12.51 rows=1000 width=8) (actual time=0.010..0.012 rows=18 loops=1)
                                      ->  Function Scan on generate_series created_at_3  (cost=0.01..12.51 rows=1000 width=8) (actual time=0.010..0.012 rows=18 loops=1)
              ->  Materialize  (cost=2160576.87..2185898.76 rows=5064379 width=25) (actual time=19123.895..20601.926 rows=5066445 loops=1)
                    Buffers: shared hit=676943 read=747852 dirtied=755, temp read=28515 written=28531
                    ->  Sort  (cost=2160576.87..2173237.82 rows=5064379 width=25) (actual time=19123.888..20125.620 rows=5066445 loops=1)
                          Sort Key: (date(i.created_at)), i.status
                          Sort Method: external merge  Disk: 228120kB
                          Buffers: shared hit=676943 read=747852 dirtied=755, temp read=28515 written=28531
                          ->  Seq Scan on items i  (cost=0.00..1475438.79 rows=5064379 width=25) (actual time=0.064..16526.846 rows=5066445 loops=1)
                                Buffers: shared hit=676943 read=747852 dirtied=755
Planning Time: 0.399 ms
JIT:
  Functions: 44
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 2.096 ms, Inlining 293.474 ms, Optimization 383.558 ms, Emission 218.758 ms, Total 897.885 ms
Execution Time: 21989.150 ms

当我运行此查询时，我的缓存命中率从 99.9% 下降到 50%。 Oracle 中相同的索引（显然是

TRUNC(created_at)

而不是

DATE(created_at)

）和相同的查询大约需要 500 毫秒。我假设我错过了一些东西。

Answer 1

VACUUM ANALYZE items

。如果我设置与您描述的类似的数据集，我确实会得到索引扫描。
这个答案主要是为了引发编辑，您可以在其中添加

explain(analyze, verbose, buffers, settings)

，显示它与此有何不同，并填写有关数据集的其他详细信息，使其与此不同。

您可以稍微缩短、简化和加快查询速度：生成一次日历并让每天的目标状态

unnest()

：
_{db<>fiddle 的演示}

explain analyze verbose
SELECT COUNT(i.id)                    AS count,
       COALESCE(SUM(i.item_price), 0) AS amount,
       dates_table.status,
       dates_table.created_at
FROM (SELECT created_at::date
           , unnest(array[10,20,30,40]) AS status
      FROM GENERATE_SERIES('2024-09-18'::date,'2024-10-18','1 DAY') AS created_at
     ) AS dates_table
LEFT JOIN items i 
       ON i.status = dates_table.status
      AND DATE(i.created_at) = dates_table.created_at
GROUP BY dates_table.created_at, dates_table.status
ORDER BY dates_table.created_at, dates_table.status;

查询计划
排序（成本=34735.27..34736.27行=400宽度=48）（实际时间=52.316..52.329行=124循环=1）
输出：(count(i.id)), (COALESCE(sum(i.item_price), '0'::numeric)), (unnest('{10,20,30,40}'::integer[] )), ((created_at.created_at)::日期)
排序键：((created_at.created_at)::date), (unnest('{10,20,30,40}'::integer[]))
排序方式：快速排序内存：33kB
-> HashAggregate（成本=34712.98..34717.98行=400宽度=48）（实际时间=52.148..52.229行=124循环=1）
输出：count(i.id), COALESCE(sum(i.item_price), '0'::numeric), (unnest('{10,20,30,40}'::integer[])), ( (创建时间.创建时间)::日期)
组键：((created_at.created_at)::date), (unnest('{10,20,30,40}'::integer[]))
批次：1 内存使用：93kB
-> 嵌套循环左连接（成本=0.43..34162.98行=55000宽度=28）（实际时间=0.086..43.184行=18355循环=1）
输出：(unnest('{10,20,30,40}'::integer[])), ((created_at.created_at)::date), i.id, i.item_price
-> ProjectSet（成本=0.01..40.01行=4000宽度=8）（实际时间=0.024..0.267行=124循环=1）
输出：(created_at.created_at)::date, unnest('{10,20,30,40}'::integer[])
-> pg_catalog.generate_series 创建的函数扫描（成本=0.01..10.01 行=1000 宽度=8）（实际时间=0.021..0.042 行=31 循环=1）
输出：created_at.created_at
函数调用：generate_series(('2024-09-18'::date)::带时区的时间戳，'2024-10-18 00:00:00+00'::带时区的时间戳，'1天' ::间隔）
-> 索引扫描在public.items i上使用items_dash_idx（成本=0.43..7.92行=60宽度=32）（实际时间=0.021..0.315行=148循环=124）
输出：i.id、i.item_price、i.status、i.created_at
索引条件： ((i.status = ((unnest('{10,20,30,40}'::integer[])))::numeric) AND (date(i.created_at) = ((created_at.创建时间）::日期）））
规划时间：0.335毫秒
执行时间：52.412 ms

如何让 Postgres 使用我的功能索引

问题描述投票：0回答：1

1个回答

最新问题

如何让 Postgres 使用我的功能索引

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1