在Django模型中使用Trigram（gin_trgm_ops）创建Gin索引

Question

django.contrib.postgres的新TrigramSimilarity功能非常适合我遇到的问题。我将它用于搜索栏以找到难以拼写的拉丁名字。问题是有超过200万个名字，搜索需要的时间比我想要的要长。

我喜欢在postgres文档qazxsw poi中描述的三元组创建一个索引

但我不知道如何以Django API使用它的方式来做到这一点。对于postgres文本搜索，有关于如何创建索引的描述。但不是因为三元组的相似性。 https://www.postgresql.org/docs/9.6/static/pgtrgm.html

这就是我现在所拥有的：

https://docs.djangoproject.com/en/1.11/ref/contrib/postgres/search/#performance

然后在vieuw的get_queryset中我做：

class NCBI_names(models.Model):
tax_id          =   models.ForeignKey(NCBI_nodes, on_delete=models.CASCADE, default = 0)
name_txt        =   models.CharField(max_length=255, default = '')
name_class      =   models.CharField(max_length=32, db_index=True, default = '')
class Meta:
    indexes = [GinIndex(fields=['name_txt'])]

编辑将整个视图类放入

Answer 1

我有一个类似的问题，试图使用class TaxonSearchListView(ListView): #form_class=TaxonSearchForm template_name='collectie/taxon_list.html' paginate_by=20 model=NCBI_names context_object_name = 'taxon_list' def dispatch(self, request, *args, **kwargs): query = request.GET.get('q') if query: try: tax_id = self.model.objects.get(name_txt__iexact=query).tax_id.tax_id return redirect('collectie:taxon_detail', tax_id) except (self.model.DoesNotExist, self.model.MultipleObjectsReturned) as e: return super(TaxonSearchListView, self).dispatch(request, *args, **kwargs) else: return super(TaxonSearchListView, self).dispatch(request, *args, **kwargs) def get_queryset(self): result = super(TaxonSearchListView, self).get_queryset() # query = self.request.GET.get('q') if query: result = result.exclude(name_txt__icontains = 'sp.') result = result.annotate(similarity=TrigramSimilarity('name_txt', query)).filter(similarity__gt=0.3).order_by('-similarity') return result扩展来支持有效的pg_tgrm和contains Django字段查找。

可能有更优雅的方式，但定义像这样的新索引类型对我有用：

icontains

方法from django.contrib.postgres.indexes import GinIndex class TrigramIndex(GinIndex): def get_sql_create_template_values(self, model, schema_editor, using): fields = [model._meta.get_field(field_name) for field_name, order in self.fields_orders] tablespace_sql = schema_editor._get_index_tablespace_sql(model, fields) quote_name = schema_editor.quote_name columns = [ ('%s %s' % (quote_name(field.column), order)).strip() + ' gin_trgm_ops' for field, (field_name, order) in zip(fields, self.fields_orders) ] return { 'table': quote_name(model._meta.db_table), 'name': quote_name(self.name), 'columns': ', '.join(columns), 'using': using, 'extra': tablespace_sql, }是从get_sql_create_template_values复制的，只有一个修改：添加Index.get_sql_create_template_values()。

对于您的用例，您将使用此+ ' gin_trgm_ops'而不是name_txt在TrigramIndex上定义索引。然后运行GinIndex，它将生成一个生成所需makemigrations SQL的迁移。

更新：

我看到你也在使用CREATE INDEX进行查询：

icontains

Postgresql后端将把它变成这样的东西：

result.exclude(name_txt__icontains = 'sp.')

然后由于UPPER("NCBI_names"."name_txt"::text) LIKE UPPER('sp.')将不会使用trigram指数。

我有同样的问题，最后继承数据库后端来解决它：

UPPER()

Answer 2

受到关于这个问题的from django.db.backends.postgresql import base, operations class DatabaseFeatures(base.DatabaseFeatures): pass class DatabaseOperations(operations.DatabaseOperations): def lookup_cast(self, lookup_type, internal_type=None): lookup = '%s' # Cast text lookups to text to allow things like filter(x__contains=4) if lookup_type in ('iexact', 'contains', 'icontains', 'startswith', 'istartswith', 'endswith', 'iendswith', 'regex', 'iregex'): if internal_type in ('IPAddressField', 'GenericIPAddressField'): lookup = "HOST(%s)" else: lookup = "%s::text" return lookup class DatabaseWrapper(base.DatabaseWrapper): """ Override the defaults where needed to allow use of trigram index """ ops_class = DatabaseOperations def __init__(self, *args, **kwargs): self.operators.update({ 'icontains': 'ILIKE %s', 'istartswith': 'ILIKE %s', 'iendswith': 'ILIKE %s', }) self.pattern_ops.update({ 'icontains': "ILIKE '%%' || {} || '%%'", 'istartswith': "ILIKE {} || '%%'", 'iendswith': "ILIKE '%%' || {}", }) super(DatabaseWrapper, self).__init__(*args, **kwargs)的启发，我登陆了old article，为current one提供了以下解决方案：

更新：从Django-1.11看起来似乎更简单，因为GistIndex和this answer sugest：

django docs

来自from django.contrib.postgres.indexes import GinIndex class MyModel(models.Model): the_field = models.CharField(max_length=512, db_index=True) class Meta: indexes = [GinIndex(fields=['the_field'])]，为此目的，Django-2.2将提供opclasses属性。

class Index(fields=(), name=None, db_tablespace=None, opclasses=())

然后您可以在模型类中使用它，如下所示：

from django.contrib.postgres.indexes import GistIndex

class GistIndexTrgrmOps(GistIndex):
    def create_sql(self, model, schema_editor):
        # - this Statement is instantiated by the _create_index_sql()
        #   method of django.db.backends.base.schema.BaseDatabaseSchemaEditor.
        #   using sql_create_index template from
        #   django.db.backends.postgresql.schema.DatabaseSchemaEditor
        # - the template has original value:
        #   "CREATE INDEX %(name)s ON %(table)s%(using)s (%(columns)s)%(extra)s"
        statement = super().create_sql(model, schema_editor)
        # - however, we want to use a GIST index to accelerate trigram
        #   matching, so we want to add the gist_trgm_ops index operator
        #   class
        # - so we replace the template with:
        #   "CREATE INDEX %(name)s ON %(table)s%(using)s (%(columns)s gist_trgrm_ops)%(extra)s"
        statement.template =\
            "CREATE INDEX %(name)s ON %(table)s%(using)s (%(columns)s gist_trgm_ops)%(extra)s"

        return statement

Answer 3

如果有人想要在空间上连接（连接）多列的索引，你可以使用我内置索引的修改。

创建像class YourModel(models.Model): some_field = models.TextField(...) class Meta: indexes = [ GistIndexTrgrmOps(fields=['some_field']) ]这样的索引

gin (("column1" || ' ' || "column2" || ' ' || ...) gin_trgm_ops)

在Django模型中使用Trigram（gin_trgm_ops）创建Gin索引

问题描述投票：10回答：3

3个回答

最新问题

在Django模型中使用Trigram（gin_trgm_ops）创建Gin索引

问题描述 投票：10回答：3

3个回答

最新问题

问题描述投票：10回答：3