我试图了解如何在 Opensearch 中解决这个问题(但 Elasticsearch 解决方案可以)。
本质上,我有一个工作索引,我试图根据两个参数对它们进行排序,并为每个参数赋予相同的权重:订阅层和受欢迎程度得分(每个都是每个工作文档中的字段)。
通常,当你排序时,你先根据一个排序,然后再根据另一个排序,基本上我需要混合它们并给每个赋予 50/50 的权重。
当职位按相关性排序时(默认),我们希望这是根据权重 w 的订阅层和职位个人相关性得分的组合,例如这个公式:
职位将根据加权分数排名。
加权分数 = (r1 x w) + (r2 x (1-w) 其中:
r1 = 如果仅考虑相关性,职位在给定搜索中的排名;和 r2 = 如果只考虑订阅,职位在给定搜索中的排名
然而,问题是我需要执行多次搜索才能获得每个工作的每个排序标准的排名,这将是非常低效的。我正在尝试看看我是否可以使用 Opensearch 在本地解决这个问题。
例如,我试图将其计算为一个脚本评分函数,仅使用这两个字段,但它们完全不相关并且在那时之间没有标准化,因此分配相同的权重变得具有挑战性。
这是我到目前为止尝试过的。首先添加一些测试文件:
POST _bulk
{"index":{"_index":"tier-sort","_id":"1"}}
{"title":"Job 1","popularity_score":"0.105","bid":"100"}
{"index":{"_index":"tier-sort","_id":"2"}}
{"title":"Job 2","popularity_score":"0.06","bid":"50"}
{"index":{"_index":"tier-sort","_id":"3"}}
{"title":"Job 3","popularity_score":"0.099","bid":"25"}
{"index":{"_index":"tier-sort","_id":"4"}}
{"title":"Job 4","popularity_score":"0.155","bid":"5"}
{"index":{"_index":"tier-sort","_id":"5"}}
{"title":"Job 5","popularity_score":"0.028","bid":"100"}
{"index":{"_index":"tier-sort","_id":"6"}}
{"title":"Job 6","popularity_score":"0.118","bid":"100"}
{"index":{"_index":"tier-sort","_id":"7"}}
{"title":"Job 7","popularity_score":"0.186","bid":"50"}
{"index":{"_index":"tier-sort","_id":"8"}}
{"title":"Job 8","popularity_score":"0.019","bid":"25"}
{"index":{"_index":"tier-sort","_id":"9"}}
{"title":"Job 9","popularity_score":"0.081","bid":"5"}
{"index":{"_index":"tier-sort","_id":"10"}}
{"title":"Job 10","popularity_score":"0.124","bid":"100"}
{"index":{"_index":"tier-sort","_id":"11"}}
{"title":"Job 11","popularity_score":"0.163","bid":"100"}
{"index":{"_index":"tier-sort","_id":"12"}}
{"title":"Job 12","popularity_score":"0.025","bid":"50"}
{"index":{"_index":"tier-sort","_id":"13"}}
{"title":"Job 13","popularity_score":"0.16","bid":"25"}
{"index":{"_index":"tier-sort","_id":"14"}}
{"title":"Job 14","popularity_score":"0.119","bid":"5"}
{"index":{"_index":"tier-sort","_id":"15"}}
{"title":"Job 15","popularity_score":"0.16","bid":"100"}
然后,我尝试使用脚本评分,让每个因素对排序贡献一半:
GET tier-sort/_search
{
"size": 100,
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"script_score": {
"script": "doc['popularity_score'].value"
},
},
{
"script_score": {
"script": "doc['bid'].value"
},
}
]
}
}
}
然而,问题是规范化。出价和人气是完全不同的尺度。这在 Elasticsearch 中如何实现?有没有办法在本地执行此操作?
提前致谢!
Elasticsearch/Opensearch的搜索结果排名有2种方式改变
_score
_score
上排序,但是如果你指定了除_score
之外的排序逻辑,boosting逻辑将被忽略,并且_score
会被设置为null,只有排序部分有效我认为如果您想提升/排序的 2 个参数是数字并且可以从
_source
或 doc_values
中检索,那么任何一种方法都可以解决您的问题。
例如。您可以使用 Function Score 下的 Field Value Factor 来给某个字段赋予权重,最后通过指定
"boost_mode": "sum"
将它们相加
基于脚本的排序也可以帮助您通过排序实现目标
如果指定提升逻辑,您还可以使用 explain api 详细了解分数是如何计算的。这可以帮助您调试查询