如何管理实体的同义词

问题描述 投票:0回答:1

如何管理在搜索时可以通过多个名称引用的实体(例如运动队)的同义词

The following tags refer to Dallas Mavericks team - https://www.mavs.com/
Mavs, dallasmavs, DALLAS MAVERICKS

The following tags refer to Trail Blazers team - https://www.nba.com/blazers
Portland Trail Blazers, Trail Blazers 

The following tags refer to Los Angeles Lakers team - https://www.nba.com/lakers/
lakers, Lakers

The following tags refer to Phoenix Suns team
phoenix suns, Phoenix Suns

我想根据上述标签使用同义词功能来匹配实体(例如运动队)。我如何将其识别为一流文档,允许我通过任何标签进行搜索,以便主要团队可以返回结果。如何在 Elasticsearch 中对该实体建模并轻松设置同义词

elasticsearch
1个回答
0
投票

Tldr;

正如评论中提到的,您可能想查看从版本 8.10 开始就出现在堆栈中的同义词 API

创建同义词就像这样简单:

PUT _synonyms/my-synonyms-set
{
  "synonyms_set": [
    {
      "id": "test-1",
      "synonyms": "hello, hi, ciao"
    }
  ]
}

演示

根据您的具体情况,我正在创建以下同义词

PUT _synonyms/sport_teams_synonyms
{
  "synonyms_set": [
    {
      "synonyms": "dallas mavericks => mavs, dallasmavs, mavericks"
    },
    {
        "synonyms": "portland trail blazers, trail blazers => ptb"
    }
  ]
}

然后创建以下索引

PUT sport_teams_match
{
  "settings": {
    "analysis": {
      "filter": {
        "sts_filter": {
          "type": "synonym_graph",
          "synonyms_set": "sport_teams_synonyms",
          "updateable": true
        }
      },
      "analyzer": {
        "sport_teams_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "sts_filter"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "team_1": {
        "type": "text",
        "search_analyzer": "sport_teams_analyzer"
      },
      "team_2": {
        "type": "text",
        "search_analyzer": "sport_teams_analyzer"
      }
    }
  }
}

加载了一些文档

PUT _bulk
{ "index" : { "_index" : "sport_teams_match"} }
{ "team_1" : "mavs", "team_2": "lakers" }
{ "index" : { "_index" : "sport_teams_match"} }
{ "team_1" : "trail blazers", "team_2": "lakers" }

以下搜索查询应该找到第一个文档

GET sport_teams_match/_search?q=team_1:"Mavericks"
GET sport_teams_match/_search?q=team_1:"Dallas Mavericks"

太棒了,让我们尝试一下

Trail Blazers

GET sport_teams_match/_search?q=team_1:"Trail Blazers"

Uhuuuu 不工作?为什么 ?? _analyze API 可以拯救你。给定特定的分析器管道和一些文本,该 api 返回提取的令牌。

POST sport_teams_match/_analyze
{
  "analyzer": "sport_teams_analyzer",
  "text":     "Trail Blazers"
}

POST sport_teams_match/_analyze
{
  "analyzer": "standard",
  "text":     "trail blazers"
}

你会看到:

  • sport_teams_analyzer
    =>
    ptb
  • standard
    =>
    trail
    ,
    blazers

我们该如何解决这个问题?

ptb
毕竟可能不是一个很好的同义词?

© www.soinside.com 2019 - 2024. All rights reserved.