在 script_score 中使用嵌套值

Question

我正在尝试在脚本分数中使用嵌套值，但在使其工作时遇到问题，因为我无法通过 doc 访问该字段来迭代该字段。另外，当我尝试在 Kibana 中查询它（如

_type:images AND _exists_:colors

）时，它不会匹配任何文档，即使当我单独查看它们时，该字段清楚地存在于我的所有文档中。不过，我可以使用 params._source 访问它，但我读到它可能会很慢，所以不建议这样做。

我知道这个问题完全是由于我们创建这个嵌套字段的方式造成的，所以如果我不能想出比这更好的方法，我将不得不重新索引我们的 2m+ 文档，看看是否可以找到解决该问题的另一种方法，但我想避免这种情况，并且只是更好地了解 Elastic 在幕后是如何工作的，以及为什么它的行为方式如此。

我在这里提供的例子不是我现实生活中的问题，但也描述了这个问题。想象我们有一个描述图像的文档。该文档有一个字段，其中包含图像中存在多少红色、蓝色和绿色的值。

请求创建带有嵌套字段的索引和文档，其中包含颜色数组，颜色数组之间有 100 点的分割：

PUT images
{
  "settings": {
    "number_of_shards": 1
  },
  "mappings": {
    "_doc": {
      "properties": {
        "id" : { "type" : "integer" },
        "title" : { "type" : "text" },
        "description" : { "type" : "text" },
        "colors": {
          "type": "nested",
          "properties": {
            "red": {
              "type": "double"
            },
            "green": {
              "type": "double"
            },
            "blue": {
              "type": "double"
            }
          }
        }
      }
    }
  }
}

PUT images/_doc/1
{
    "id" : 1,
    "title" : "Red Image",
    "description" : "Description of Red Image",
    "colors": [
      {
        "red": 100
      },
      {
        "green": 0
      },
      {
        "blue": 0
      }
    ]
}

PUT images/_doc/2
{
    "id" : 2,
    "title" : "Green Image",
    "description" : "Description of Green Image",
    "colors": [
      {
        "red": 0
      },
      {
        "green": 100
      },
      {
        "blue": 0
      }
    ]
}

PUT images/_doc/3
{
    "id" : 3,
    "title" : "Blue Image",
    "description" : "Description of Blue Image",
    "colors": [
      {
        "red": 0
      },
      {
        "green": 0
      },
      {
        "blue": 100
      }
    ]
}

现在，如果我使用 doc:

运行此查询

GET images/_search
{
  "query": {
    "function_score": {
      "functions": [
        {
          "script_score": {
            "script": {
              "source": """
                boolean debug = true;
                for(color in doc["colors"]) {
                  if (debug === true) {
                    throw new Exception(color["red"].toString());
                  }
                }
              """
            }
          }
        }
      ]
    }
  }
}

我会得到异常

No field found for [colors] in mapping with types []

，但如果我使用params._source代替，就像这样：

GET images/_search
{
  "query": {
    "function_score": {
      "functions": [
        {
          "script_score": {
            "script": {
              "source": """
                boolean debug = true;
                for(color in params._source["colors"]) {
                  if (debug === true) {
                    throw new Exception(color["red"].toString());
                  }
                }
              """
            }
          }
        }
      ]
    }
  }
}

我能够输出

"caused_by": {"type": "exception", "reason": "100"}

，所以我知道它有效，因为第一个文档是红色的并且值为 100。

我什至不确定这是否可以归类为一个问题，但更多的是寻求帮助。如果有人可以解释为什么会出现这种情况，并给出解决该问题的最佳方法，我将非常感激。

（此外，一些在 Painless 中调试的技巧也很可爱！！！）

Answer 1

在 Elasticsearch 的评分脚本中

"script_score": {"script": {"source": "..." }}

，您可以使用

param._source

对象访问嵌套值。

例如，如果您有

documents

索引，其中包含如下文档：

{
  "title": "Yankees Potential Free Agent Target: Max Scherzer",
  "body": "...",
  "labels": {
    "genres": "news",
    "topics": ["sports", "celebrities"]
    "publisher": "CNN"
  }
}

以下查询将按随机顺序返回 100 个文档，优先考虑具有

sports

主题的文档：

GET documents/_search
{
  "size": 100,
  "sort": [
    "_score"
  ],
  "query": {
    "function_score": {
      "query": { "match_all": {} },
      "functions": [
        {
          "random_score": {}
        },
        {
          "script_score": {
            "script": {
              "source": """
                double boost = 1.0;
                if (params._source['labels'] != null && params._source['labels']['topics'] != null && params._source['labels']['topics'].contains('sports') {
                    boost += 2.0;
                }
                return boost;
              """
            }
          }
        }
      ],
      "score_mode": "multiply",
      "boost_mode": "replace"
    }
  }
}

Answer 2

不用担心

params._source

的缓慢 - 这是您唯一的选择，因为迭代

doc

的嵌套上下文仅允许访问单个嵌套颜色。

试试这个：

GET images/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "image"
          }
        },
        {
          "function_score": {
            "functions": [
              {
                "script_score": {
                  "script": {
                    "source": """
                        def score = 0;
                        for (color in params._source["colors"]) {
                          // Debug.explain(color);
                          if (color.containsKey('red')) {
                            score += color['red'] ;
                          }
                        }
                        return score;
                    """
                  }
                }
              }
            ]
          }
        }
      ]
    }
  }
}

无痛评分上下文在这里。

其次，您已经非常接近手动抛出异常了——不过有一种更简洁的方法可以做到这一点。取消注释

Debug.explain(color);

就可以开始了。

还有一件事，我特意添加了一个

match

查询来提高分数，但更重要的是，为了说明如何在后台构建查询 - 当您在

GET images/_validate/query?explain

下重新运行上述内容时，您会亲眼看到.

Answer 3

我不知道你到底想实现什么。

我认为您可以使用带有 script_score 的嵌套查询，如下例所示。

像这样

GET images/_search
{
    "query": {
        "nested": {
            "path": "colors",
            "query": {
                "bool": {
                    "must": [{
                        "exists": {
                            "field": "colors.red"
                        }
                    }, {
                        "function_score": {
                            "script_score": {
                                "script": "doc['colors.red'].value"
                            }
                        }
                    }]
                }
            }
        }
    }
}

在 script_score 中使用嵌套值

问题描述投票：0回答：3

3个回答

最新问题

在 script_score 中使用嵌套值

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3