我有一个 Elasticsearch 索引,用于存储具有以下字段的文档:
我需要从最新且优先级最高的文档中找到 5 个唯一的 user_id 值。如果我使用 SQL,我会通过以下方式解决这个问题:
SELECT DISTINCT user_id
FROM (
SELECT user_id
FROM documents
ORDER BY timestamp DESC, priority DESC
) AS sorted_docs
LIMIT 5;
但是,我不确定如何在 Elasticsearch 中执行类似的操作。我已经研究过使用术语和复合聚合,但我不清楚:
有人可以指导我如何构建此查询或提供示例吗?任何帮助或参考将不胜感激!
您尚未共享任何示例数据,但根据您提供的信息,我创建了示例数据并提供了示例查询。
您可以使用 Elasticsearch 的 Collapse 功能来获取 user_id 的不同值。
让我们考虑以下是您的索引映射:
PUT test
{
"mappings": {
"properties": {
"user_id":{
"type": "keyword"
},
"timestamp":{
"type": "date"
},
"priority":{
"type": "integer"
}
}
}
}
以下是示例数据:
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "user_id" : "101", "timestamp" : "2024-11-05T12:10:30Z","priority" : "1"}
{ "index" : { "_index" : "test", "_id" : "2" } }
{ "user_id" : "101", "timestamp" : "2024-11-05T16:10:30Z","priority" : "2"}
{ "index" : { "_index" : "test", "_id" : "3" } }
{ "user_id" : "102", "timestamp" : "2024-11-04T12:10:30Z","priority" : "3"}
{ "index" : { "_index" : "test", "_id" : "4" } }
{ "user_id" : "102", "timestamp" : "2024-11-05T08:10:30Z","priority" : "4"}
{ "index" : { "_index" : "test", "_id" : "5" } }
{ "user_id" : "103", "timestamp" : "2024-11-05T12:10:30Z","priority" : "5"}
{ "index" : { "_index" : "test", "_id" : "6" } }
{ "user_id" : "103", "timestamp" : "2024-11-03T12:10:30Z","priority" : "6"}
{ "index" : { "_index" : "test", "_id" : "7" } }
{ "user_id" : "104", "timestamp" : "2024-11-05T12:10:30Z","priority" : "7"}
{ "index" : { "_index" : "test", "_id" : "8" } }
{ "user_id" : "105", "timestamp" : "2024-11-04T12:10:30Z","priority" : "8"}
{ "index" : { "_index" : "test", "_id" : "9" } }
{ "user_id" : "105", "timestamp" : "2024-11-01T12:10:30Z","priority" : "9"}
{ "index" : { "_index" : "test", "_id" : "10" } }
{ "user_id" : "106", "timestamp" : "2024-11-02T12:10:30Z","priority" : "10"}
您可以使用下面的查询,它将按时间戳和优先级排序,并按 user_id 折叠,这将确保您获得所有唯一的用户 ID:
POST test/_search
{
"size": 5,
"sort": [
{
"timestamp": {
"order": "desc"
}
},
{
"priority": {
"order": "desc"
}
}
],
"query": {
"match_all": {}
},
"collapse": {
"field": "user_id"
}
}
以下将回复上述询问:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "test",
"_id": "2",
"_score": null,
"_source": {
"user_id": "101",
"timestamp": "2024-11-05T16:10:30Z",
"priority": "2"
},
"fields": {
"user_id": [
"101"
]
},
"sort": [
1730823030000,
2
]
},
{
"_index": "test",
"_id": "7",
"_score": null,
"_source": {
"user_id": "104",
"timestamp": "2024-11-05T12:10:30Z",
"priority": "7"
},
"fields": {
"user_id": [
"104"
]
},
"sort": [
1730808630000,
7
]
},
{
"_index": "test",
"_id": "5",
"_score": null,
"_source": {
"user_id": "103",
"timestamp": "2024-11-05T12:10:30Z",
"priority": "5"
},
"fields": {
"user_id": [
"103"
]
},
"sort": [
1730808630000,
5
]
},
{
"_index": "test",
"_id": "4",
"_score": null,
"_source": {
"user_id": "102",
"timestamp": "2024-11-05T08:10:30Z",
"priority": "4"
},
"fields": {
"user_id": [
"102"
]
},
"sort": [
1730794230000,
4
]
},
{
"_index": "test",
"_id": "8",
"_score": null,
"_source": {
"user_id": "105",
"timestamp": "2024-11-04T12:10:30Z",
"priority": "8"
},
"fields": {
"user_id": [
"105"
]
},
"sort": [
1730722230000,
8
]
}
]
}
}
我认为 10 是最高优先级,因为您没有在问题中澄清。如果您希望 1 作为最高优先级,请将排序中的优先级顺序从
asc
更改为 desc
。