查询返回对elasticsearch的搜索差异

问题描述 投票:0回答:1

以下查询的外观如何:

场景:

我有两个基数((1和2的基数),每列有1列,我想看看它们之间的区别,即存在于基数1中而基数2中不存在的内容。列的虚拟名称作为主机名。

示例:

Base1.Hostname的选定值是针对Base2.Hostname的?

YES → DO NOT RETURN
NO  → RETURN

我在python中具有以下功能:

def diff(first, second):
        second = set (second)
        return [item for item in first if item not in second]

示例匹配项等于:

GET /base1/_search
{
  "query": {
    "multi_match": {
      "query": "webserver",
      "fields": [
        "hostname"
      ],
      "type": "phrase"
    }
  }
}

我想将此架构迁移到弹性搜索,以便将来以这些forecast的变化频率为基础生成search

elasticsearch search diff
1个回答
0
投票

这可以通过聚合来完成。

  1. 从base1和base2索引中收集所有主机名
  2. 对于在base1和base2中出现的每个主机名计数
  3. 仅保留基数为1且基数为2的存储桶
GET base*/_search
{
  "size": 0,
  "aggs": {
    "all": {
      "composite": {
        "size": 10, 
        "sources": [
          {
            "host": {
              "terms": {
                "field": "hostname"
              }
            }
          }
        ]
      },
      "aggs": {
        "base1": {
          "filter": {
            "match": {
              "_index": "base1"
            }
          }
        },
        "base2": {
          "filter": {
            "match": {
              "_index": "base2"
            }
          }
        },
        "index_count_bucket_filter": {
          "bucket_selector": {
            "buckets_path": {
              "base1_count": "base1._count",
              "base2_count": "base2._count"
            },
            "script": "params.base1_count == 1 && params.base2_count == 0"
          }
        }
      }
    }
  }
}

顺便说一句,不要忘记使用分页来获得其余结果。

参考:

  1. https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html
  2. https://discuss.elastic.co/t/data-set-difference-between-fields-on-different-indexes/160015/4
© www.soinside.com 2019 - 2024. All rights reserved.