使用elasticsearch 6过滤文档中的对象数组,删除不匹配的对象

问题描述 投票:0回答:1

在elasticsearch 6.0中,我创建了一个具有嵌套映射类型的索引:

PUT node2
{
    "settings" : {
        "index" : {
            "number_of_shards" : 3,
            "number_of_replicas" : 0,
            "mapping.total_fields.limit" : "2300"
        }
    },
    "mappings": {
      "content": {
        "properties": {
          "basicPageBodyParagraphs": {
            "type": "nested"
          }
        }
      }
    }
}

其中

basicPageBodyParagraphs
是对象数组。该索引中的文档将类似于:

{
  "id": "16dfb723-dac7-47cd-a898-47d9bd054c09",
  "fields": null,
  "more": null,
  "pathAlias": "about/sdc-access-test-2",
  "status": 0,
  "basicPageBodyParagraphs": [
    {
      "fabTextContent": "<p>This is a full text paragraph. The page is restricted to students. This paragraph has no further restrictions, so should be visible to all students.</p>",
      "paragraphAccessRoles": [],
      "type": "fab-text"
    },
    {
      "fabTextContent": "<p>This is a second full text paragraph. This one is restricted to students studying Biology.</p>",
      "paragraphAccessRoles": ["155eccdf-5ea0-ec11-8135-00155dfb7c0d"],
      "type": "fab-text"
    },
    {
      "type": "bullets",
      "items": [
        {
          "content": "<p>This is the first bullet point.</p>"
        },
        {
          "content": "<p>This is the second bullet point.</p>"
        }
      ],
      "title": "This is a bullets paragraph with access restrictions",
      "alignment": "left",
      "paragraphAccessRoles": ["4efd1649-ba34-eb11-810c-005056930a83"]
    }
  ],
  "contentType": "basic_page"
}

我希望能够查询我的索引并根据

basicPageBodyParagraphs
检索
paragraphAccessRoles
,因此,如果学生的 ID 为 155eccdf-5ea0-ec11-8135-00155dfb7c0d,查询将返回仅包含以下内容的文档:

"basicPageBodyParagraphs": [
    {
      "fabTextContent": "<p>This is a full text paragraph. The page is restricted to students. This paragraph has no further restrictions, so should be visible to all students.</p>",
      "paragraphAccessRoles": [],
      "type": "fab-text"
    },
    {
      "fabTextContent": "<p>This is a second full text paragraph. This one is restricted to students studying Biology.</p>",
      "paragraphAccessRoles": ["155eccdf-5ea0-ec11-8135-00155dfb7c0d"],
      "type": "fab-text"
    }
]

因此,返回不包含

paragraphAccessRoles
的第一段,返回与学生 ID 匹配的
paragraphAccessRoles
的第二段,但不返回第三段,因为
paragraphAccessRoles
与学生 ID 不匹配( 155eccdf-5ea0-ec11-8135-00155dfb7c0d)。

为此,我使用查询:

POST /node2/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "pathAlias.keyword": "about/sdc-access-test-2"
          }
        },
        {
          "nested": {
            "path": "basicPageBodyParagraphs",
            "query": {
              "bool": {
                "should": [
                  {
                    "terms": {
                      "basicPageBodyParagraphs.paragraphAccessRoles.keyword": [
                        "155eccdf-5ea0-ec11-8135-00155dfb7c0d"
                      ]
                    }
                  },
                  {
                    "bool": {
                      "must_not": [
                        {
                          "exists": {
                            "field": "basicPageBodyParagraphs.paragraphAccessRoles"
                          }
                        }
                      ]
                    }
                  }
                ]
              }
            },
            "inner_hits": {}
          }
        }
      ]
    }
  }
}

此查询部分返回我想要的内容:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 7.4361506,
    "hits": [
      {
        "_index": "node2",
        "_type": "content",
        "_id": "16dfb723-dac7-47cd-a898-47d9bd054c09",
        "_score": 7.4361506,
        "_source": {
          "id": "16dfb723-dac7-47cd-a898-47d9bd054c09",
          "fields": null,
          "more": null,
          "pathAlias": "about/sdc-access-test-2",
          "status": 0,
          "basicPageBodyParagraphs": [
            {
              "fabTextContent": "<p>This is a full text paragraph. The page is restricted to students. This paragraph has no further restrictions, so should be visible to all students.</p>",
              "paragraphAccessRoles": [],
              "type": "fab-text"
            },
            {
              "fabTextContent": "<p>This is a second full text paragraph. This one is restricted to students studying Biology.</p>",
              "paragraphAccessRoles": ["155eccdf-5ea0-ec11-8135-00155dfb7c0d"],
              "type": "fab-text"
            },
            {
              "type": "bullets",
              "items": [
                {
                  "content": "<p>This is the first bullet point.</p>"
                },
                {
                  "content": "<p>This is the second bullet point.</p>"
                }
              ],
              "title": "This is a bullets paragraph with access restrictions",
              "alignment": "left",
              "paragraphAccessRoles": ["4efd1649-ba34-eb11-810c-005056930a83"]
            }
          ],
          "contentType": "basic_page"
        },
        "inner_hits": {
          "basicPageBodyParagraphs": {
            "hits": {
              "total": 2,
              "max_score": 1,
              "hits": [
                {
                  "_nested": {
                    "field": "basicPageBodyParagraphs",
                    "offset": 1
                  },
                  "_score": 1,
                  "_source": {
                    "fabTextContent": "<p>This is a second full text paragraph. This one is restricted to students studying Biology.</p>",
                    "paragraphAccessRoles": [
                      "155eccdf-5ea0-ec11-8135-00155dfb7c0d"
                    ],
                    "type": "fab-text"
                  }
                },
                {
                  "_nested": {
                    "field": "basicPageBodyParagraphs",
                    "offset": 0
                  },
                  "_score": 1,
                  "_source": {
                    "fabTextContent": "<p>This is a full text paragraph. The page is restricted to students. This paragraph has no further restrictions, so should be visible to all students.</p>",
                    "paragraphAccessRoles": [],
                    "type": "fab-text"
                  }
                }
              ]
            }
          }
        }
      }
    ]
  }
}

文档已返回,但

basicPageBodyParagraphs
包含所有 3 个对象,而不是我期望的 2 个对象。直到您进一步看到返回的数据包含一个
inner_hits
属性,其中包含 2 个预期段落(尽管不按顺序,从 0...1)。我不想将结果带回文档外部的属性中,而是让查询删除主文档中不匹配的
basicPageBodyParagraphs
对象。

有没有办法让查询过滤掉不匹配的

basicPageBodyParagraphs
并返回主文档结果中的那些?

elasticsearch elasticsearch-dsl
1个回答
0
投票

我认为这行不通,因为对象数组是如何展平的。我也没有在

flattened
中看到过这个,我怀疑这是否有效。

但我的第一个想法是尝试与亲子同行。这会产生性能开销,并且您的查询会更加复杂,但我不确定是否还有其他方法。

© www.soinside.com 2019 - 2024. All rights reserved.