Azure OpenAI 使用 ElasticSearch 在您的数据上 - 内容的字段映射

问题描述 投票:0回答:1

我通过 Python 在您的数据上使用 Azure OpenAI,并以 ElasticSearch 作为我的数据源。 这是配置:

{
    "schema": 1.1,
    "description": "testartu",
    "type": "completion",
    "completion": {
      "model": "gpt-4o-mini",
      "completion_type": "chat",
      "include_history": false,
      "include_input": true,
      "data_sources": [
        {
          "type": "elasticsearch",
          "parameters": {
            "endpoint": "https://mytest.es.westeurope.azure.elastic-cloud.com",
            "index_name": "myindex",
            "authentication": {
              "type": "encoded_api_key",
              "encoded_api_key": "xxxxxx"
            },
            "query_type": "simple",
            "roleInformation": "You are a friendly assistant",
            "fields_mapping": {
              "title_field": "titleText",
              "content_fields": ["contentText","contentText.0", "contentText[0]"]
            }
          }
        }
      ],
      "stop_sequences": null,
      "temperature": 0.7,
      "top_p": 0.95,
      "max_tokens": 800,
      "input_max_tokens": 128000
    },
    "augmentation": {
      "augmentation_type": "none"
    }
}

ElasticSearch 上的索引具有以下(简化)格式:

{
    "titleText": "Here is my title",
    "contentText": ["Here is a lot of text", "here is even more text"]
}

使用此配置对 ElasticSearch 的请求仅在

titleText
字段上成功搜索。包含所有信息的
contentText
永远不会映射到响应中。

我的猜测是,这是因为

contentText
是字符串数组而不是字符串,并且像
contentText.0
contentText[0]
这样的调整不起作用。

有什么想法吗?

elasticsearch azure-openai
1个回答
0
投票

使用 _update_by_query 创建一个将 contentText 元素连接到新字段的新字段不是更容易吗?

POST /<index>/_update_by_query
{
"query": {
  "match_all": {}
},
"script": "ctx._source.contentTextConcat = ctx._source.contentText[0] + ' ; ' + ctx._source.contentText[1]"
}

之后,您可以添加到

content_fields
contentTextConcat
来代替

"schema": 1.1,
    "description": "testartu",
    "type": "completion",
    "completion": {
      "model": "gpt-4o-mini",
      "completion_type": "chat",
      "include_history": false,
      "include_input": true,
      "data_sources": [
        {
          "type": "elasticsearch",
          "parameters": {
            "endpoint": "https://mytest.es.westeurope.azure.elastic-cloud.com",
            "index_name": "myindex",
            "authentication": {
              "type": "encoded_api_key",
              "encoded_api_key": "xxxxxx"
            },
            "query_type": "simple",
            "roleInformation": "You are a friendly assistant",
            "fields_mapping": {
              "title_field": "titleText",
              "content_fields": ["contentTextConcat"]
            }
          }
        }
      ],
      "stop_sequences": null,
      "temperature": 0.7,
      "top_p": 0.95,
      "max_tokens": 800,
      "input_max_tokens": 128000
    },
    "augmentation": {
      "augmentation_type": "none"
    }
}
© www.soinside.com 2019 - 2024. All rights reserved.