Amazon RDS-Postgres未将索引用于SELECT查询

Question

我有一种感觉，我做错了非常严重的事，但我似乎无法弄清楚。

我尝试执行以下查询：

Select col1, col2, col3, col4, col5, day, month, year,
       sum(num1) as sum_num1, 
       sum(num2) as sum_num2,
       count(*) as count_items
from test_table where day = 10 and month = 5 and year = 2020
group by col1, col2, col3, col4, col5, day, month, year;

此外，我在day, month, year上有一个使用以下命令设置的索引

CREATE INDEX CONCURRENTLY testtable_dmy_idx on test_table (day, month, year);

现在我知道了设置顺序扫描开/关的设置，并尝试处理查询。

因此，当使用SET enable_seqscan TO on;（顺便说一句是默认行为）和EXPLAIN (analyze,buffers,timing)运行先前的查询时，我得到以下输出：

-- Select Query with Sequential scan on 

QUERY PLAN
Finalize GroupAggregate  (cost=9733303.39..10836008.34 rows=5102790 width=89) (actual time=1100914.091..1110820.480 rows=491640 loops=1)
"  Group Key: col1, col2, col3, col4, col5, day, month, year"
"  Buffers: shared hit=25020 read=2793049 dirtied=10040, temp read=74932 written=75039"
  I/O Timings: read=1059425.134
  ->  Gather Merge  (cost=9733303.39..10607468.38 rows=6454984 width=89) (actual time=1100911.426..1110193.876 rows=795097 loops=1)
        Workers Planned: 2
        Workers Launched: 2
"        Buffers: shared hit=76964 read=8416562 dirtied=33686, temp read=230630 written=230956"
        I/O Timings: read=3178066.529
        ->  Partial GroupAggregate  (cost=9732303.36..9861403.04 rows=3227492 width=89) (actual time=1100791.915..1107668.495 rows=265032 loops=3)
"              Group Key: col1, col2, col3, col4, col5, day, month, year"
"              Buffers: shared hit=76964 read=8416562 dirtied=33686, temp read=230630 written=230956"
              I/O Timings: read=3178066.529
              ->  Sort  (cost=9732303.36..9740372.09 rows=3227492 width=81) (actual time=1100788.479..1105630.411 rows=2630708 loops=3)
"                    Sort Key: col1, col2, col3, col4, col5"
                    Sort Method: external merge  Disk: 241320kB
                    Worker 0:  Sort Method: external merge  Disk: 246776kB
                    Worker 1:  Sort Method: external merge  Disk: 246336kB
"                    Buffers: shared hit=76964 read=8416562 dirtied=33686, temp read=230630 written=230956"
                    I/O Timings: read=3178066.529
                    ->  Parallel Seq Scan on test_table  (cost=0.00..9074497.49 rows=3227492 width=81) (actual time=656277.982..1073808.146 rows=2630708 loops=3)
                          Filter: ((day = 10) AND (month = 5) AND (year = 2020))
                          Rows Removed by Filter: 24027044
                          Buffers: shared hit=76855 read=8416561 dirtied=33686
                          I/O Timings: read=3178066.180
Planning Time: 4.017 ms
Execution Time: 1111033.041 ms
Total time - Around 18 minutes

然后，当我设置SET enable_seqscan TO off;并使用Explain运行相同的查询时，我得到以下信息：

-- Select Query with Sequential scan off

QUERY PLAN
Finalize GroupAggregate  (cost=10413126.05..11515831.01 rows=5102790 width=89) (actual time=59211.363..66579.750 rows=491640 loops=1)
"  Group Key: col1, col2, col3, col4, col5, day, month, year"
"  Buffers: shared hit=3 read=104091, temp read=77942 written=78052"
  I/O Timings: read=28662.857
  ->  Gather Merge  (cost=10413126.05..11287291.05 rows=6454984 width=89) (actual time=59211.262..65973.857 rows=795178 loops=1)
        Workers Planned: 2
        Workers Launched: 2
"        Buffers: shared hit=33 read=218096, temp read=230092 written=230418"
        I/O Timings: read=51560.508
        ->  Partial GroupAggregate  (cost=10412126.03..10541225.71 rows=3227492 width=89) (actual time=57013.922..62453.555 rows=265059 loops=3)
"              Group Key: col1, col2, col3, col4, col5, day, month, year"
"              Buffers: shared hit=33 read=218096, temp read=230092 written=230418"
              I/O Timings: read=51560.508
              ->  Sort  (cost=10412126.03..10420194.76 rows=3227492 width=81) (actual time=57013.423..60368.530 rows=2630708 loops=3)
"                    Sort Key: col1, col2, col3, col4, col5"
                    Sort Method: external merge  Disk: 246944kB
                    Worker 0:  Sort Method: external merge  Disk: 246120kB
                    Worker 1:  Sort Method: external merge  Disk: 241408kB
"                    Buffers: shared hit=33 read=218096, temp read=230092 written=230418"
                    I/O Timings: read=51560.508
                    ->  Parallel Bitmap Heap Scan on test_table  (cost=527733.84..9754320.16 rows=3227492 width=81) (actual time=18155.864..30957.312 rows=2630708 loops=3)
                          Recheck Cond: ((day = 10) AND (month = 5) AND (year = 2020))
                          Rows Removed by Index Recheck: 1423
                          Heap Blocks: exact=13374 lossy=44328
                          Buffers: shared hit=3 read=218096
                          I/O Timings: read=51560.508
                          ->  Bitmap Index Scan on testtable_dmy_idx  (cost=0.00..525797.34 rows=7745982 width=0) (actual time=18148.218..18148.228 rows=7892123 loops=1)
                                Index Cond: ((day = 10) AND (month = 5) AND (year = 2020))
                                Buffers: shared hit=3 read=46389
                                I/O Timings: read=17368.250
Planning Time: 2.787 ms
Execution Time: 66783.481 ms
Total Time - Around 1 min

我似乎不明白为什么我会得到这种行为或我做错了什么，因为我希望Postgres自动优化查询，但是那没有发生。

任何帮助将不胜感激。

编辑1：

有关RDS postgres版本的更多信息：

SELECT version();

x86_64-pc-linux-gnu上的PostgreSQL 11.5，由gcc（GCC）4.8.3 20140911（Red Hat 4.8.3-9）编译，64位

编辑2：

SET max_parallel_workers_per_gather TO 0的默认运行值为2（如SHOW max_parallel_workers_per_gather所示]

-- Select Query with Sequential scan ON
QUERY PLAN
GroupAggregate  (cost=11515667.22..11799074.58 rows=5102790 width=89) (actual time=1120868.377..1133231.165 rows=491640 loops=1)
"  Group Key: col1, col2, col3, col4, col5, day, month, year"
"  Buffers: shared hit=92456 read=8400966, temp read=295993 written=296321"
  I/O Timings: read=1041723.362
  ->  Sort  (cost=11515667.22..11535032.17 rows=7745982 width=81) (actual time=1120865.304..1129419.809 rows=7892123 loops=1)
"        Sort Key: col1, col2, col3, col4, col5"
        Sort Method: external merge  Disk: 734304kB
"        Buffers: shared hit=92456 read=8400966, temp read=295993 written=296321"
        I/O Timings: read=1041723.362
        ->  Seq Scan on test_table  (cost=0.00..9888011.58 rows=7745982 width=81) (actual time=663266.269..1070560.993 rows=7892123 loops=1)
              Filter: ((day = 10) AND (month = 5) AND (year = 2020))
              Rows Removed by Filter: 72081131
              Buffers: shared hit=92450 read=8400966
              I/O Timings: read=1041723.362
Planning Time: 5.829 ms
Execution Time: 1133422.968 ms
Total Time - Around 18 mins

随后，

-- Select Query with Sequential scan OFF
QUERY PLAN
GroupAggregate  (cost=12190966.21..12474373.57 rows=5102790 width=89) (actual time=109048.306..119255.079 rows=491640 loops=1)
"  Group Key: col1, col2, col3, col4, col5, day, month, year"
"  Buffers: shared hit=3 read=218096, temp read=295993 written=296321"
  I/O Timings: read=55697.723
  ->  Sort  (cost=12190966.21..12210331.17 rows=7745982 width=81) (actual time=109047.621..115468.268 rows=7892123 loops=1)
"        Sort Key: col1, col2, col3, col4, col5"
        Sort Method: external merge  Disk: 734304kB
"        Buffers: shared hit=3 read=218096, temp read=295993 written=296321"
        I/O Timings: read=55697.723
        ->  Bitmap Heap Scan on test_table  (cost=527733.84..10563310.57 rows=7745982 width=81) (actual time=16941.764..62203.367 rows=7892123 loops=1)
              Recheck Cond: ((day = 10) AND (month = 5) AND (year = 2020))
              Rows Removed by Index Recheck: 4270
              Heap Blocks: exact=39970 lossy=131737
              Buffers: shared hit=3 read=218096
              I/O Timings: read=55697.723
              ->  Bitmap Index Scan on testtable_dmy_idx  (cost=0.00..525797.34 rows=7745982 width=0) (actual time=16933.964..16933.964 rows=7892123 loops=1)
                    Index Cond: ((day = 10) AND (month = 5) AND (year = 2020))
                    Buffers: shared hit=3 read=46389
                    I/O Timings: read=16154.294
Planning Time: 3.684 ms
Execution Time: 119440.147 ms
Total Time - Around 2 mins

编辑3：

我使用以下方法检查了插入，更新，删除，活动和无效元组的数量

SELECT n_tup_ins as "inserts",n_tup_upd as "updates",n_tup_del as "deletes", n_live_tup as "live_tuples", n_dead_tup as "dead_tuples"
FROM pg_stat_user_tables
where relname = 'test_table';

得到以下结果

| inserts     | updates | deletes   | live_tuples | dead_tuples |
|-------------|---------|-----------|-------------|-------------|
| 296590964   | 0       | 412400995 | 79717032    | 7589442     |

运行以下命令

VACUUM（VERBOSE，ANALYZE）test_table

得到以下结果：

[2020-05-15 18:34:08] [00000] vacuuming "public.test_table"
[2020-05-15 18:37:13] [00000] scanned index "testtable_dmy_idx" to remove 7573896 row versions
[2020-05-15 18:37:56] [00000] scanned index "testtable_unixts_idx" to remove 7573896 row versions
[2020-05-15 18:38:16] [00000] "test_table": removed 7573896 row versions in 166450 pages
[2020-05-15 18:38:16] [00000] index "testtable_dmy_idx" now contains 79973254 row versions in 1103313 pages
[2020-05-15 18:38:16] [00000] index "testtable_unixts_idx" now contains 79973254 row versions in 318288 pages
[2020-05-15 18:38:16] [00000] "test_table": found 99 removable, 2196653 nonremovable row versions in 212987 out of 8493416 pages
[2020-05-15 18:38:16] [00000] vacuuming "pg_toast.pg_toast_25023"
[2020-05-15 18:38:16] [00000] index "pg_toast_25023_index" now contains 0 row versions in 1 pages
[2020-05-15 18:38:16] [00000] "pg_toast_25023": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
[2020-05-15 18:38:16] [00000] analyzing "public.test_table"
[2020-05-15 18:38:27] [00000] "test_table": scanned 30000 of 8493416 pages, containing 282611 live rows and 0 dead rows; 30000 rows in sample, 80011093 estimated total rows
[2020-05-15 18:38:27] completed in 4 m 19 s 58 ms

之后，相同查询的结果如下：

| inserts   | updates | deletes   | live_tuples | dead_tuples |
|-----------|---------|-----------|-------------|-------------|
| 296590964 | 0       | 412400995 | 80011093    | 0           |

Answer 1

[通常，尤其是对于您的查询，COUNT(*)查询中的SUM(...)和GROUP BY往往是性能的杀手。原因是为了得出每个多列组的计数和总和，Postgres必须访问索引中每个记录的表示形式。因此，Postgres无法从逻辑上消除任何记录，在这种情况下，倾向于不使用索引。

GROUP BY查询中使用索引would的情况将是如果查询具有使用某些列的HAVING或MIN的MAX子句。另外，如果您的查询中包含WHERE子句，则在那里可以使用索引。但是，您当前的查询不能非常优化。

Answer 2

   Rows Removed by Filter: 24027044
   Buffers: shared hit=76855 read=8416561 dirtied=33686
   I/O Timings: read=3178066.180

在seq扫描中有很多缓冲区被弄脏了。我猜您最近没有足够的桌子吸尘器。否则autovac落后了，因为您接受了默认设置，对于大多数现代专用系统来说，这些设置太慢了（直到v12）。

也，24027044/8416561 =每页大约2.85行。这是一个非常低的数字。你的元组非常宽吗？您的桌子极度肿吗？但是这些都不能回答您的问题，因为计划者应该了解这些问题并将其考虑在内。但是我们可能需要知道找出计划者出了什么问题。（这些计算可能不正确，因为我不知道哪些数字按比例分配给工人，哪些数字不是按比例分配的，但是我认为3的系数不会改变这里有些奇怪的结论。）

8416561 * 1024 * 8 / 3178.066 / 1024/1024 = 20 MB / S。那似乎很低。您已在RDS“硬件”上配置了哪些IO设置？您的seq_page_cost和random_page_cost设置可能与您的实际IO容量有误。（尽管这可能不是很有效，请参见下文）

对于您的位图堆扫描：

Heap Blocks: exact=13374 lossy=44328
Buffers: shared hit=3 read=218096

看起来所有符合条件的元组都集中在极少数的块中（与seq扫描显示的总表大小相比）。我认为计划者在位图扫描时没有充分考虑到这一点。有patch out there for this，但错过了v13的截止日期。（如果没有人去审查它，那么它可能也会错过v14的截止日期-轻推一下。）基本上，计划者知道“ day”列与表的物理顺序具有高度相关性，并且使用此列有知识可说，位图堆扫描将几乎是所有顺序IO。但是它也不能推断出它将仅扫描表的一小部分。这个问题使位图扫描看起来像seq扫描一样，但是有额外的开销层（会查询索引），因此，使用它并不奇怪。

Amazon RDS-Postgres未将索引用于SELECT查询

问题描述投票：0回答：2

2个回答

最新问题

Amazon RDS-Postgres未将索引用于SELECT查询

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2