我在每个购物中心都有每日统计记录,其字段如下:
有两个字段,我将使用bucket_script
具有比率cpnTotalCount / orderTotalCount
,并使用bucket_sort
获得topK。
但是如果我仅选择7天才能获得topK购物中心,由于doc_count_error_upper_bound
,我将无法获得准确的结果>
术语聚合中的文档计数(和任何子聚合
的结果)并不总是准确的。每个分片都提供自己的术语顺序列表视图。这些视图结合在一起给出最终视图。还有其他方法可以在“准确性”和“性能”之间实现更好的平衡。
任何帮助将不胜感激;)
doc_count_error_upper_bound
我正在为每个购物中心提供每日统计记录,其中的字段如下:cpnTotalCount orderTotalCount orderTime mallId cpnTotalAmount我将使用bucket_script的两个字段来创建一个...
如果数据集不是很大,就我而言,它可能在一年内达到{
"size": 10,
"query": {
"bool": {
"filter": [
{
"range": {
"orderTime": {
"from": 1589385600000,
"to": 1590249599999,
"include_lower": true,
"include_upper": true,
"boost": 1.0
}
}
},
{
"range": {
"cpnTotalCount": {
"from": 3,
"to": null,
"include_lower": true,
"include_upper": true,
"boost": 1.0
}
}
}
],
"adjust_pure_negative": true,
"boost": 1.0
}
},
"aggs": {
"es_aggs_bucketing": {
"terms": {
"field": "mallId",
"size": 20,
"shard_size": 10000,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
},
"aggregations": {
"es_aggs_count_one": {
"sum": {
"field": "cpnTotalCount"
}
},
"es_aggs_count_two": {
"sum": {
"field": "orderTotalCount"
}
},
"es_aggs_sum_one": {
"sum": {
"field": "cpnTotalAmount"
}
},
"es_aggs_script": {
"bucket_script": {
"buckets_path": {
"orderCount": "es_aggs_count_two",
"couponCount": "es_aggs_count_one"
},
"script": {
"source": "params.couponCount/params.orderCount",
"lang": "painless"
},
"gap_policy": "skip"
}
},
"sort": {
"bucket_sort": {
"sort": [
{
"es_aggs_script": {
"order": "desc"
}
}
],
"from": 0,
"size": 40,
"gap_policy": "SKIP"
}
}
}
}
}
}
,所以我正在尝试