如何优化 PostgreSQL 数据库的前缀搜索？

Question

我的 PostgreSQL 数据库中有一个名为“节点”的表，其中包含大约 170 万行

=#\d nodes
            Table "public.nodes"
 Column |          Type          | Modifiers 
--------+------------------------+-----------
 id     | integer                | not null
 title  | character varying(256) | 
 score  | double precision       | 
Indexes:
    "nodes_pkey" PRIMARY KEY, btree (id)

我想使用该表中的信息来自动完成搜索字段，向用户显示与其输入相符的得分最高的十个标题的列表。所以我使用了这个查询（这里搜索所有以“s”开头的标题）

=# explain analyze select title,score from nodes where title ilike 's%' order by score desc; 
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Sort  (cost=64177.92..64581.38 rows=161385 width=25) (actual time=4930.334..5047.321 rows=161264 loops=1)
   Sort Key: score
   Sort Method:  external merge  Disk: 5712kB
   ->  Seq Scan on nodes  (cost=0.00..46630.50 rows=161385 width=25) (actual time=0.611..4464.413 rows=161264 loops=1)
         Filter: ((title)::text ~~* 's%'::text)
 Total runtime: 5260.791 ms
(6 rows)

对于使用自动完成功能来说，这太慢了。根据在 Web 2.0 应用程序中使用 PostgreSQL 的一些信息，我能够通过特殊索引来改进它

=# create index title_idx on nodes using btree(lower(title) text_pattern_ops);
=# explain analyze select title,score from nodes where lower(title) like lower('s%') order by score desc limit 10;
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=18122.41..18122.43 rows=10 width=25) (actual time=1324.703..1324.708 rows=10 loops=1)
   ->  Sort  (cost=18122.41..18144.60 rows=8876 width=25) (actual time=1324.700..1324.702 rows=10 loops=1)
         Sort Key: score
         Sort Method:  top-N heapsort  Memory: 17kB
         ->  Bitmap Heap Scan on nodes  (cost=243.53..17930.60 rows=8876 width=25) (actual time=96.124..1227.203 rows=161264 loops=1)
               Filter: (lower((title)::text) ~~ 's%'::text)
               ->  Bitmap Index Scan on title_idx  (cost=0.00..241.31 rows=8876 width=0) (actual time=90.059..90.059 rows=161264 loops=1)
                     Index Cond: ((lower((title)::text) ~>=~ 's'::text) AND (lower((title)::text) ~<~ 't'::text))
 Total runtime: 1325.085 ms
(9 rows)

所以这使我的速度提高了 4 倍。但是这可以进一步改进吗？如果我想使用

'%s%'

而不是

's%'

该怎么办？在这种情况下，我是否也有机会使用 PostgreSQL 获得不错的性能？或者我应该更好地尝试不同的解决方案（Lucene？，Sphinx？）来实现我的自动完成功能？

Answer 1

如果您不在

text_pattern_ops

区域设置，则需要

索引。

请参阅：索引类型。

Answer 2

进一步调查的提示：

根据标题键对表进行分区。这使得 postgres 需要使用的列表更小。
给postgresql更多的内存，这样缓存命中率> 98%。这个表大概需要0.5G左右，我想现在2G应该没问题了。确保统计信息收集已启用并在 pg_stats 表上读取。
制作第二个表，其中标题的字符串减少，例如12 个字符，因此完整的表可以容纳在更少的数据库块中。子字符串上的索引也可能有效，但需要仔细查询。
子字符串越长，查询运行得越快。为小子字符串创建一个单独的表，并将前十个或您想要显示的任何选项存储在值中。 1、2、3字符串的组合大约有20000种。
如果您想要 %abc% 查询，您可以使用相同的想法，但现在切换到 lucene 可能是有意义的。

Answer 3

您显然对 150000 以上的结果不感兴趣，所以您应该限制它们：

select title,score
  from nodes
  where title ilike 's%'
  order by score desc
  limit 10;

您还可以考虑创建函数索引，并使用“">=”和“<":

create index nodes_title_lower_idx on nodes (lower(title));
select title,score
  from nodes
  where lower(title)>='s' and lower(title)<'t'
  order by score desc
  limit 10;

您还应该在分数上创建索引，这在

ilike %s%

情况下会有帮助。

Answer 4

同时前缀 + 后缀

LIKE '%abc%'

可以使用
gin
+
pg_trgm

加快速度

用途：

CREATE EXTENSION pg_trgm;
CREATE TABLE "mytable" ("col1" TEXT);
CREATE INDEX "mytable_col1_gin" ON "mytable" USING gin("col1" gin_trgm_ops);
EXPLAIN SELECT * FROM "mytable" WHERE "col1" LIKE '%abc%';

产生：

                                QUERY PLAN                                 
---------------------------------------------------------------------------
 Bitmap Heap Scan on mytable  (cost=15.10..104.10 rows=400 width=72)
   Recheck Cond: (col1 ~~ '%abc%'::text)
   ->  Bitmap Index Scan on mytable_col1_gin  (cost=0.00..15.00 rows=400 width=0)
         Index Cond: (col1 ~~ '%abc%'::text)

这意味着查询速度加快了。

备注：

专用问题：Postgres Select ILIKE %text% is Slow On Large String Rows
如果搜索词
```
abc
```
有三个或更多索引（即三元组），PostgreSQL默认仅使用索引：我想搜索2个字符，有什么解决方案吗？（三元组索引仅适用于至少 3 个字符）

如何优化 PostgreSQL 数据库的前缀搜索？

问题描述投票：0回答：4

4个回答

最新问题

如何优化 PostgreSQL 数据库的前缀搜索？

问题描述 投票：0回答：4

4个回答

最新问题

问题描述投票：0回答：4