有没有办法给文档开头的术语增加权重?例如,我有3个文档。
Medicine XXX
Sulpher This medicine contains sulpher and should be taken only after consultation with your doctor.
Medicine YYY
contains: sulpher Not recommended by most physicians
Medicine ZZZ
This medicine works like sulpher but does not contain sulpher at all.
对于搜索词 "Sulpher",文档XXX应该列在最前面,因为那是该文档的第一个词。如果YYY被列在最上面也没关系,因为那和XXX一样。但是ZZZ应该永远是最后一个。换句话说,在 "左边 "找到的术语应该比在文档 "右边 "找到的术语有更高的优先级。
你可以通过小写规范化的术语位置来提升。
PUT sulphur
{
"settings": {
"analysis": {
"normalizer": {
"keyword_lowercase": {
"type": "custom",
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"properties": {
"text": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"normalizer": "keyword_lowercase"
}
}
}
}
}
}
POST sulphur/_doc
{"text":"This medicine works like sulpher but does not contain sulpher at all."}
POST sulphur/_doc
{"text":"contains: sulpher Not recommended by most physicians"}
POST sulphur/_doc
{"text":"Sulpher This medicine contains sulpher and should be taken only after consultation with your doctor."}
然后...
GET sulphur/_search
{
"query": {
"bool": {
"must": [
{
"function_score": {
"query": {
"match": {
"text": "sulpher"
}
},
"script_score": {
"script": """
def pos = doc['text.keyword'].value.indexOf('sulpher');
return Math.exp((2.0/(pos+1)))
"""
},
"boost_mode": "replace"
}
}
]
}
}
}
嫕
[
{
"_index":"sulphur",
"_type":"_doc",
"_id":"sf5S2nEBW-D5QnrWODvB",
"_score":7.389056,
"_source":{
"text":"Sulpher This medicine contains sulpher and should be taken only after consultation with your doctor."
}
},
{
"_index":"sulphur",
"_type":"_doc",
"_id":"sP5S2nEBW-D5QnrWNjtw",
"_score":1.1993961,
"_source":{
"text":"contains: sulpher Not recommended by most physicians"
}
},
{
"_index":"sulphur",
"_type":"_doc",
"_id":"r_5S2nEBW-D5QnrWNDuw",
"_score":1.079959,
"_source":{
"text":"This medicine works like sulpher but does not contain sulpher at all."
}
}
]