Azure OpenAI 使用 ElasticSearch 在您的数据上 - 内容的字段映射

Question

我通过 Python 在您的数据上使用 Azure OpenAI，并以 ElasticSearch 作为我的数据源。这是配置：

{
    "schema": 1.1,
    "description": "testartu",
    "type": "completion",
    "completion": {
      "model": "gpt-4o-mini",
      "completion_type": "chat",
      "include_history": false,
      "include_input": true,
      "data_sources": [
        {
          "type": "elasticsearch",
          "parameters": {
            "endpoint": "https://mytest.es.westeurope.azure.elastic-cloud.com",
            "index_name": "myindex",
            "authentication": {
              "type": "encoded_api_key",
              "encoded_api_key": "xxxxxx"
            },
            "query_type": "simple",
            "roleInformation": "You are a friendly assistant",
            "fields_mapping": {
              "title_field": "titleText",
              "content_fields": ["contentText","contentText.0", "contentText[0]"]
            }
          }
        }
      ],
      "stop_sequences": null,
      "temperature": 0.7,
      "top_p": 0.95,
      "max_tokens": 800,
      "input_max_tokens": 128000
    },
    "augmentation": {
      "augmentation_type": "none"
    }
}

ElasticSearch 上的索引具有以下（简化）格式：

{
    "titleText": "Here is my title",
    "contentText": ["Here is a lot of text", "here is even more text"]
}

使用此配置对 ElasticSearch 的请求仅在

titleText

字段上成功搜索。包含所有信息的

contentText

永远不会映射到响应中。

我的猜测是，这是因为

contentText

是字符串数组而不是字符串，并且像

contentText.0

、

contentText[0]

这样的调整不起作用。

有什么想法吗？

Answer 1

使用 _update_by_query 创建一个将 contentText 元素连接到新字段的新字段不是更容易吗？

POST /<index>/_update_by_query
{
"query": {
  "match_all": {}
},
"script": "ctx._source.contentTextConcat = ctx._source.contentText[0] + ' ; ' + ctx._source.contentText[1]"
}

之后，您可以添加到

content_fields

contentTextConcat

来代替

"schema": 1.1,
    "description": "testartu",
    "type": "completion",
    "completion": {
      "model": "gpt-4o-mini",
      "completion_type": "chat",
      "include_history": false,
      "include_input": true,
      "data_sources": [
        {
          "type": "elasticsearch",
          "parameters": {
            "endpoint": "https://mytest.es.westeurope.azure.elastic-cloud.com",
            "index_name": "myindex",
            "authentication": {
              "type": "encoded_api_key",
              "encoded_api_key": "xxxxxx"
            },
            "query_type": "simple",
            "roleInformation": "You are a friendly assistant",
            "fields_mapping": {
              "title_field": "titleText",
              "content_fields": ["contentTextConcat"]
            }
          }
        }
      ],
      "stop_sequences": null,
      "temperature": 0.7,
      "top_p": 0.95,
      "max_tokens": 800,
      "input_max_tokens": 128000
    },
    "augmentation": {
      "augmentation_type": "none"
    }
}

Azure OpenAI 使用 ElasticSearch 在您的数据上 - 内容的字段映射

问题描述投票：0回答：1

1个回答

最新问题

Azure OpenAI 使用 ElasticSearch 在您的数据上 - 内容的字段映射

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1