如何管理在搜索时可以通过多个名称引用的实体(例如运动队)的同义词
The following tags refer to Dallas Mavericks team - https://www.mavs.com/
Mavs, dallasmavs, DALLAS MAVERICKS
The following tags refer to Trail Blazers team - https://www.nba.com/blazers
Portland Trail Blazers, Trail Blazers
The following tags refer to Los Angeles Lakers team - https://www.nba.com/lakers/
lakers, Lakers
The following tags refer to Phoenix Suns team
phoenix suns, Phoenix Suns
我想根据上述标签使用同义词功能来匹配实体(例如运动队)。我如何将其识别为一流文档,允许我通过任何标签进行搜索,以便主要团队可以返回结果。如何在 Elasticsearch 中对该实体建模并轻松设置同义词
正如评论中提到的,您可能想查看从版本 8.10 开始就出现在堆栈中的同义词 API
创建同义词就像这样简单:
PUT _synonyms/my-synonyms-set
{
"synonyms_set": [
{
"id": "test-1",
"synonyms": "hello, hi, ciao"
}
]
}
根据您的具体情况,我正在创建以下同义词
PUT _synonyms/sport_teams_synonyms
{
"synonyms_set": [
{
"synonyms": "dallas mavericks => mavs, dallasmavs, mavericks"
},
{
"synonyms": "portland trail blazers, trail blazers => ptb"
}
]
}
然后创建以下索引
PUT sport_teams_match
{
"settings": {
"analysis": {
"filter": {
"sts_filter": {
"type": "synonym_graph",
"synonyms_set": "sport_teams_synonyms",
"updateable": true
}
},
"analyzer": {
"sport_teams_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"sts_filter"
]
}
}
}
},
"mappings": {
"properties": {
"team_1": {
"type": "text",
"search_analyzer": "sport_teams_analyzer"
},
"team_2": {
"type": "text",
"search_analyzer": "sport_teams_analyzer"
}
}
}
}
加载了一些文档
PUT _bulk
{ "index" : { "_index" : "sport_teams_match"} }
{ "team_1" : "mavs", "team_2": "lakers" }
{ "index" : { "_index" : "sport_teams_match"} }
{ "team_1" : "trail blazers", "team_2": "lakers" }
以下搜索查询应该找到第一个文档
GET sport_teams_match/_search?q=team_1:"Mavericks"
GET sport_teams_match/_search?q=team_1:"Dallas Mavericks"
太棒了,让我们尝试一下
Trail Blazers
?
GET sport_teams_match/_search?q=team_1:"Trail Blazers"
Uhuuuu 不工作?为什么 ?? _analyze API 可以拯救你。给定特定的分析器管道和一些文本,该 api 返回提取的令牌。
POST sport_teams_match/_analyze
{
"analyzer": "sport_teams_analyzer",
"text": "Trail Blazers"
}
POST sport_teams_match/_analyze
{
"analyzer": "standard",
"text": "trail blazers"
}
你会看到:
sport_teams_analyzer
=> ptb
standard
=> trail
, blazers
我们该如何解决这个问题?
ptb
毕竟可能不是一个很好的同义词?