停用词在乌克兰语 [Elasticsearch] 中不起作用

问题描述 投票:0回答:1

我也有这样的问题。有产品:

Папір офісний Double A, A5 (148 х 210 мм), Premium 80г/м2 500 аркушів

我提出搜索请求:

Папір офісний 500 аркушів
- 一切都很好,已找到。

但是当我提出请求时:

Папір офісний на 500 аркушів
- 找不到产品。

可能是什么问题。我尝试将其添加到

stopwords
但没有给出任何结果。

我需要这样的词:“on”,“in”,“and”......不会以任何方式影响搜索查询。它们在搜索查询中通常被忽略

图片:elasticsearch:7.9.1

这是构建索引的代码:

public function createIndex()
    {
        $mappingParams = [
            'index' => $this->getIndexName(),
            'body' => [
                'mappings' => [
                    '_source' => [
                        'enabled' => true
                    ],
                    'properties' => $this->getMappingProperties(),
                ],
                'settings' => [
                    'analysis' => [
                        'normalizer' => [
                            'lowercase_keyword' => [
                                'type' => 'custom',
                                'filter' => ['lowercase', 'trim'],
                            ],
                        ],
                        'tokenizer' => [
                            'ngram_tokenizer' => [
                                'type' => 'edge_ngram',
                                'min_gram' => 1,
                                'max_gram' => 15,
                                'token_chars' => [
                                    'letter',
                                    'digit',
                                ],
                            ],
                        ],
                        'filter' => [
                            'synonym_filter' => [
                                'type' => 'synonym',
                                'synonyms' => [
                                    'аркуш, сторінка', 'арк, ст', 'аркуш, ст', 'сторінка, арк', 'автомобіль, машина',
                                    'тетрадь, зошит', 'кошелек, гаманець', 'на' => ' ',
                                ],
                            ],
                            'uk_stopwords' => [
                                'type' => 'stop',
                                'stopwords' => ['на', 'та', 'і', 'at', 'TY358'],
                            ],
                        ],
                        'analyzer' => [
                            'ngram_analyzer' => [
                                'type' => 'custom',
                                'tokenizer' => 'ngram_tokenizer',
                                'filter' => [
                                    'lowercase',
                                    'trim',
                                    'synonym_filter',
                                    'uk_stopwords',
                                    'stop',
                                ],
                            ],
                        ],
                    ],
                    'index' => [
                        'max_result_window' => intval(\Variable::getArray('settings.elasticsearch.index_size', 100000)),
                    ],
                ],
            ],
        ];

        return $this->getClient()?->indices()->create($mappingParams);
    }

这是索引的映射

protected array $mappingProperties = [
        'id' => [
            'type' => 'keyword',
        ],
        'name' => [
            'type' => 'text',
            'fielddata' => true,
            'analyzer' => 'ngram_analyzer',
            'search_analyzer' => 'standard',
            'fields' => [
                'keyword' => [
                    'type' => 'keyword',
                    'normalizer' => 'lowercase_keyword',
                ]
            ],
        ],
        'name_ru' => [
            'type' => 'text',
            'fielddata' => true,
            'analyzer' => 'ngram_analyzer',
            'search_analyzer' => 'standard',
            'fields' => [
                'keyword' => [
                    'type' => 'keyword',
                    'normalizer' => 'lowercase_keyword',
                ]
            ],
        ],
        'body' => [
            'type' => 'text',
        ],
        'body_ru' => [
            'type' => 'text',
        ],
        'price' => [
            'type' => 'object',
        ],
        'extern_id' => [
            'type' => 'keyword',
            'fields' => [
                'long' => [
                    'type' => 'long',
                ]
            ],
        ],
        'gtin' => [
            'type' => 'keyword',
        ],
        'artikul' => [
            'type' => 'text',
            'fielddata' => true,
            'fields' => [
                'keyword' => [
                    'type' => 'keyword',
                    'normalizer' => 'lowercase_keyword',
                ]
            ],
        ],
        'gpc' => [
            'type' => 'integer',
        ],
        'rating' => [
            'type' => 'float',
        ],
        'status' => [
            'type' => 'keyword',
        ],
        'availability' => [
            'type' => 'integer',
        ],
        'created_at' => [
            'type' => 'date',
        ],
        'category_id' => [
            'type' => 'keyword',
        ],
        'categories_ids' => [
            'type' => 'keyword',
        ],
        'brand_id' => [
            'type' => 'keyword',
        ],
        'properties' => [
            'type' => 'object',
        ],
        'is_feed' => [
            'type' => 'boolean',
        ],
        'is_prior' => [
            'type' => 'boolean',
        ],
        'is_new' => [
            'type' => 'boolean',
        ],
        'is_action' => [
            'type' => 'boolean',
        ],
        'is_popular' => [
            'type' => 'boolean',
        ],
        'is_showonmain' => [
            'type' => 'boolean',
        ],
        'is_freedelivery' => [
            'type' => 'boolean',
        ],
        'has_in_gurt' => [
            'type' => 'boolean',
        ],
        'has_in_fop' => [
            'type' => 'boolean',
        ],
    ];

这就是如何通过名称(文本字段)构建搜索查询

$scoreSort = false;
        $query = [];

        if ($value = Arr::get($params, 'q')) {
            $termQuery = [
                'query' => [
                    'term' => [
                        'extern_id' => [
                            'value' => $value
                        ]
                    ]
                ]
            ];

            $termResponse = $this->searchOnElasticsearch($termQuery);

            $scoreSort = true;
            if ($termResponse['hits']['total']['value'] > 0) {
                $query['query']['bool']['must'][] = [
                    'term' => [
                        'extern_id' => substr($value, 0, 100)
                    ]
                ];
            } else {
                if (($locale = app()->getLocale()) === 'uk') {
                    $nameField = 'name';
                    $fields = ['extern_id^15', 'gtin^10', 'artikul^10', "{$nameField}^5"];
                } else {
                    $nameField = "name_{$locale}";
                    $fields = ['extern_id^15', 'gtin^10', 'artikul^10', "{$nameField}^5"];
                }

                $query['query']['bool']['must'][] = [
                    'bool' => [
                        'should' => [
                            [
                                'term' => [
                                    "{$nameField}.keyword" => $value
                                ]
                            ],
                            [
                                'multi_match' => [
                                    'fields' => $fields,
                                    'query' => substr($value, 0, 100),
                                    'fuzziness' => $this->getFuzziness(substr($value, 0, 100)),
                                    'prefix_length' => 3, 
                                    'operator' => 'AND',
                                    'analyzer' => 'ngram_analyzer',
                                ],
                            ]
                        ]
                    ]
                ];
            }
        }
php laravel elasticsearch stop-words
1个回答
0
投票

谁在乎呢。

问题是我在

分析器中使用了edge_ngram标记器。这就是为什么停用词不起作用。

因此,我做出以下决定:

  1. 我们通过
    edge_ngram
    ;
  2. 进行搜索
  3. 然后,另外,为了改进搜索,我们使用 分词器
    standard
    ;
$query['query']['bool']['must'][] = [
                    'bool' => [
                        'should' => [
                            [
                                'match' => [
                                    $nameField => [
                                        'query' => $value,
                                        'fuzziness' => $this->getFuzziness($value),
                                        'prefix_length' => 3,
                                        'operator' => 'AND',
                                        'analyzer' => "{$locale}_standard_analyzer",
                                    ],
                                ],
                            ],
                            [
                                'multi_match' => [
                                    'fields' => $fields,
                                    'query' => $value,
                                    'fuzziness' => $this->getFuzziness($value), 
                                    'prefix_length' => 3, 
                                    'operator' => 'AND',
                                    'analyzer' => "{$locale}_ngram_analyzer",
                                ],
                            ]
                        ]
                    ]
                ];
© www.soinside.com 2019 - 2024. All rights reserved.