期望: 我需要所有在上次尝试中未成功的用户。
实际/我的方法: 我应用了 userId 和 top_hits 的聚合,大小为 1 个文档,按时间降序排序。
我已经准备了这样的查询。通过这个我可以获得所有用户及其最后的状态。之后我想根据状态进行过滤。任何人都可以帮忙解决这个问题吗?我在聚合后应用了 post_filter ,但仍然没有过滤。 如果有任何其他方法,请在这里帮忙。
输入:
[
{
"userId": "u1",
"status": "Failure",
"time": 1719543600008 // This is most updated record for user - u1
},
{
"userId": "u1",
"status": "Success",
"time": 1719543600007
},
{
"userId": "u1",
"status": "Timeout",
"time": 1719543600006
},
{
"userId": "u2",
"status": "Timeout",
"time": 1719543600004 // This is most updated record for user - u2
},
{
"userId": "u2",
"status": "Failure",
"time": 1719543600003
},
{
"userId": "u3",
"status": "Success",
"time": 1719543600002 // This is most updated record for user - u3. As its success, it needs to be discarded from output
},
{
"userId": "u3",
"status": "Failure",
"time": 1719543600001
}
]
预期输出:
[
{
"userId": "u1",
"status": "Failure",
"time": 1719543600008
},
{
"userId": "u2",
"status": "Timeout",
"time": 1719543600004
}
]
查询:
{
"query": {
"bool": {
"filter": [
{
"range": {
"data.time": {
"gte": "1719543600000",
"lte": "1719584179015",
"format": "epoch_millis"
}
}
},
{
"query_string": {
"query": "data.type:\"user-stats\""
}
}
]
}
},
"aggs": {
"group_by_userId": {
"terms": {
"field": "data.userId.keyword"
},
"aggs": {
"users_last_status": {
"top_hits": {
"size": 1,
"sort": [
{
"data.time": {
"order": "desc"
}
}
]
}
}
}
}
},
"post_filter": { // In this query this filter is not working
"term": {
"data.status.keyword": "failure"
}
}
}
实际产量:
[
{
"userId": "u1",
"status": "Failure",
"time": 1719543600008
},
{
"userId": "u2",
"status": "Timeout",
"time": 1719543600004
},
{
"userId": "u3", // This shouldn't come in output as we are concerned about only failure records.
"status": "Success",
"time": 1719543600002
}
]
注意: 由于用户数量没有限制,我们不想在应用程序/客户端进行过滤以减少负载。
post_filter
只影响查询结果,不影响 aggregations
结果。
使用搜索 API 的
参数。 搜索请求适用帖子 过滤器仅搜索命中,而不是聚合。 您可以使用帖子 过滤以根据更广泛的结果集计算聚合,以及 然后进一步缩小结果范围。 https://www.elastic.co/guide/en/elasticsearch/reference/current/filter-search-results.htmlpost_filter
您可以在 query.bool.filter
中使用
terms查询,如下所示。
{
"query":{
"bool":{
"filter":[
{"range":{"data.time":{"gte":"1719543600000","lte":"1719584179015","format":"epoch_millis"}}},
{"query_string":{"query":"data.type:\"user-stats\""}},
{"terms":{"status":["Timeout","Failure"]}}
]
}
},
"aggs": {...}
}