我有一个关于一个奇怪的(?)案例的问题,我发现它在Postgresql中排序(具体来说:10.3)。
我有一个表users
与以下列:
id
- varchar(36)
- id是UUID格式firstname
- varchar(255)
,lastname
- varchar(255)
。创建以下索引:
create unique index users_pkey on users (id);
create index user_firstname on users (firstname);
create index user_lastname on users (lastname);
现在,让我们考虑每个数据集的两个查询。
firstname
是一个随机的10个字符串。
1A)
select id, firstname from users order by firstname asc, id asc limit 50;
以及此查询的执行计划:
Limit (cost=7665.06..7665.18 rows=50 width=48) (actual time=105.012..105.016 rows=50 loops=1)
-> Sort (cost=7665.06..7915.07 rows=100003 width=48) (actual time=105.012..105.014 rows=50 loops=1)
Sort Key: firstname, id
Sort Method: top-N heapsort Memory: 31kB
-> Seq Scan on users (cost=0.00..4343.03 rows=100003 width=48) (actual time=0.009..21.510 rows=100003 loops=1)
Planning time: 0.066 ms
Execution time: 105.031 ms
图1b)
select id, firstname from users order by firstname desc, id desc limit 50;
排序被更改 - desc而不是asc
以及此查询的执行计划:
Limit (cost=7665.06..7665.18 rows=50 width=48) (actual time=105.586..105.590 rows=50 loops=1)
-> Sort (cost=7665.06..7915.07 rows=100003 width=48) (actual time=105.586..105.589 rows=50 loops=1)
Sort Key: firstname DESC, id DESC
Sort Method: top-N heapsort Memory: 31kB
-> Seq Scan on users (cost=0.00..4343.03 rows=100003 width=48) (actual time=0.010..21.670 rows=100003 loops=1)
Planning time: 0.068 ms
Execution time: 105.606 ms
到现在为止还挺好。两个方向的排序需要相似的时间。
firstname
是以下格式的字符串:JohnXXXXX,其中XXXXX是数字序列,即John00000,John00001,John00002,John00003,...,John99998,John99999。
图2a)
select id, firstname from users order by firstname asc, id asc limit 50;
以及此查询的执行计划:
Limit (cost=7665.06..7665.18 rows=50 width=43) (actual time=99.572..99.577 rows=50 loops=1)
-> Sort (cost=7665.06..7915.07 rows=100003 width=43) (actual time=99.572..99.573 rows=50 loops=1)
Sort Key: firstname, id
Sort Method: top-N heapsort Memory: 29kB
-> Seq Scan on users (cost=0.00..4343.03 rows=100003 width=43) (actual time=0.009..23.660 rows=100003 loops=1)
Planning time: 0.064 ms
Execution time: 99.592 ms
图2b)
select id, firstname from users order by firstname desc, id desc limit 50;
排序被更改 - desc而不是asc
以及此查询的执行计划:
Limit (cost=7665.06..7665.18 rows=50 width=43) (actual time=659.786..659.791 rows=50 loops=1)
-> Sort (cost=7665.06..7915.07 rows=100003 width=43) (actual time=659.785..659.786 rows=50 loops=1)
Sort Key: firstname DESC, id DESC
Sort Method: top-N heapsort Memory: 32kB
-> Seq Scan on users (cost=0.00..4343.03 rows=100003 width=43) (actual time=0.010..21.510 rows=100003 loops=1)
Planning time: 0.066 ms
Execution time: 659.804 ms
对于第二个数据集,第二个查询(2b
)慢7倍。
总结一下:
+----------------+------------+------------+
| Query\Data set | 1 | 2 |
+----------------+------------+------------+
| 1 | 105.031 ms | 99.592 ms |
| 2 | 105.606 ms | 659.804 ms |
+----------------+------------+------------+
最后,我的问题。为什么第二个数据集的第二个查询比其他数据集慢6-7倍?
添加额外的50k数据后,您是否重建了索引?检查碎片。