从术语聚合得出的子聚合的准确性

问题描述 投票:0回答:1

我在每个购物中心都有每日统计记录,其字段如下:

  • cpnTotalCount
  • orderTotalCount
  • orderTime
  • mallId
  • cpnTotalAmount

有两个字段,我将使用bucket_script具有比率cpnTotalCount / orderTotalCount,并使用bucket_sort获得topK

但是如果我仅选择7天才能获得topK购物中心,由于doc_count_error_upper_bound,我将无法获得准确的结果>

术语聚合中的

文档计数(和任何子聚合

的结果)并不总是准确的。每个分片都提供自己的术语顺序列表视图。这些视图结合在一起给出最终视图。

还有其他方法可以在“准确性”和“性能”之间实现更好的平衡。

任何帮助将不胜感激;)


doc_count_error_upper_bound

我正在为每个购物中心提供每日统计记录,其中的字段如下:cpnTotalCount orderTotalCount orderTime mallId cpnTotalAmount我将使用bucket_script的两个字段来创建一个...

elasticsearch query-performance elasticsearch-aggregation
1个回答
0
投票

如果数据集不是很大,就我而言,它可能在一年内达到{ "size": 10, "query": { "bool": { "filter": [ { "range": { "orderTime": { "from": 1589385600000, "to": 1590249599999, "include_lower": true, "include_upper": true, "boost": 1.0 } } }, { "range": { "cpnTotalCount": { "from": 3, "to": null, "include_lower": true, "include_upper": true, "boost": 1.0 } } } ], "adjust_pure_negative": true, "boost": 1.0 } }, "aggs": { "es_aggs_bucketing": { "terms": { "field": "mallId", "size": 20, "shard_size": 10000, "min_doc_count": 1, "shard_min_doc_count": 0, "show_term_doc_count_error": false, "order": [ { "_count": "desc" }, { "_key": "asc" } ] }, "aggregations": { "es_aggs_count_one": { "sum": { "field": "cpnTotalCount" } }, "es_aggs_count_two": { "sum": { "field": "orderTotalCount" } }, "es_aggs_sum_one": { "sum": { "field": "cpnTotalAmount" } }, "es_aggs_script": { "bucket_script": { "buckets_path": { "orderCount": "es_aggs_count_two", "couponCount": "es_aggs_count_one" }, "script": { "source": "params.couponCount/params.orderCount", "lang": "painless" }, "gap_policy": "skip" } }, "sort": { "bucket_sort": { "sort": [ { "es_aggs_script": { "order": "desc" } } ], "from": 0, "size": 40, "gap_policy": "SKIP" } } } } } } ,所以我正在尝试

© www.soinside.com 2019 - 2024. All rights reserved.