如果我有一个看起来像这样的索引:
[{:index=>
{:_index=>"candidates",
:_id=>"a1786607-e095-4621-bdf9-de2706475614",
:data=>
{:name=>"Carli Stark",
:is_verified=>true, :has_work_permit=>true}}},
{:index=>
{:_index=>"candidates",
:_id=>"57f78d3f-392e-4cdf-a5ff-6d10e7c89d5b",
:data=>
{:name=>"Gayla Keeling",
:is_verified=>false, :has_work_permit=>true}}}]
我使用 Score_mode sum 和 boost_mode replace 进行查询(因为我只想考虑我的相关性分数):
GET candidates/_search
{
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"filter": {
"term": {
"is_verified": true
}
},
"weight": 1000
},
{
"filter": {
"term": {
"has_work_permit": true
}
},
"weight": 100000000000
}
],
"score_mode": "sum",
"boost_mode": "replace"
}
},
"_source": ["is_verified"],
"size": 50
}
那么为什么 Elasticsearch 对两个文档返回完全相同的分数? (还要注意顺序错了)
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 9.9999998E10,
"hits" : [
{
"_index" : "candidates_production_20240828114811152",
"_type" : "_doc",
"_id" : "cbd1b70b-f889-4136-a43e-f6782955f58e",
"_score" : 9.9999998E10,
"_source" : {
"is_verified" : false,
"has_work_permit" : true
}
},
{
"_index" : "candidates_production_20240828114811152",
"_type" : "_doc",
"_id" : "d644a5e5-09e0-496e-8830-c1a772c46611",
"_score" : 9.9999998E10,
"_source" : {
"is_verified" : true
"has_work_permit" : true
}
}
]
}
}
如果我使用更大的权重(例如 10000 而不是 1000),那么分数会与预期不同:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.00000006E11,
"hits" : [
{
"_index" : "candidates_production_20240828114811152",
"_type" : "_doc",
"_id" : "d644a5e5-09e0-496e-8830-c1a772c46611",
"_score" : 1.00000006E11,
"_source" : {
"is_verified" : true,
"has_work_permit" : true
}
},
{
"_index" : "candidates_production_20240828114811152",
"_type" : "_doc",
"_id" : "cbd1b70b-f889-4136-a43e-f6782955f58e",
"_score" : 9.9999998E10,
"_source" : {
"is_verified" : false
"has_work_permit" : true
}
}
]
}
}
但是如何使其准确呢?无论分数有多大,我都需要在分数中考虑较小的权重。
我的 Elasticsearch 版本是 7.10.1 (AWS ES)