我想做的是将法语单词以多种形式索引为同义词。例如,要按原样索引的 l'ami 加上两个同义词:“lami”和“l ami”,所以这个词的同义词图看起来像这样:
---l---ami--
| |
---l'ami----
| |
---lami-----
可以使用条件标记过滤器来检查单词中是否存在撇号(我将事先使用字符过滤器规范所有撇号类型),并在这种情况下应用同义词或某种过滤器。
有没有一种方法可以根据在字符串中找到某个字符的条件在索引/查询时动态添加同义词?
您的解决方案是
multiplexer
过滤器。它允许以各种方式过滤令牌
使用
condition
过滤器和 multiplexer
进行映射
PUT /dynamic_synonyms
{
"settings": {
"analysis": {
"analyzer": {
"dynamic_synonym_analyzer": {
"tokenizer": "whitespace",
"filter": [
"lowercase",
"elision_detect_filter"
]
}
},
"filter": {
"dynamic_synonym_filter": {
"type": "multiplexer",
"filters": [
"apostroph_remove_filter",
"lowercase",
"apostroph_space_replace_filter"
]
},
"apostroph_space_replace_filter": {
"type": "pattern_replace",
"pattern": "'",
"replacement": " "
},
"apostroph_remove_filter": {
"type": "pattern_replace",
"pattern": "'",
"replacement": ""
},
"elision_detect_filter": {
"type": "condition",
"filter": [
"dynamic_synonym_filter"
],
"script": {
"source": """token.term.toString().startsWith('l\'')"""
}
}
}
}
}
}
dynamic_synonym_filter
中的小写过滤器是一个noop过滤器
分析字符串
POST /dynamic_synonyms/_analyze
{
"analyzer" : "dynamic_synonym_analyzer",
"text" : "l'ami bon"
}
回应
{
"tokens" : [
{
"token" : "l'ami",
"start_offset" : 0,
"end_offset" : 5,
"type" : "word",
"position" : 0
},
{
"token" : "lami",
"start_offset" : 0,
"end_offset" : 5,
"type" : "word",
"position" : 0
},
{
"token" : "l ami",
"start_offset" : 0,
"end_offset" : 5,
"type" : "word",
"position" : 0
},
{
"token" : "bon",
"start_offset" : 6,
"end_offset" : 9,
"type" : "word",
"position" : 1
}
]
}