我通过 Python 在您的数据上使用 Azure OpenAI,并以 ElasticSearch 作为我的数据源。 这是配置:
{
"schema": 1.1,
"description": "testartu",
"type": "completion",
"completion": {
"model": "gpt-4o-mini",
"completion_type": "chat",
"include_history": false,
"include_input": true,
"data_sources": [
{
"type": "elasticsearch",
"parameters": {
"endpoint": "https://mytest.es.westeurope.azure.elastic-cloud.com",
"index_name": "myindex",
"authentication": {
"type": "encoded_api_key",
"encoded_api_key": "xxxxxx"
},
"query_type": "simple",
"roleInformation": "You are a friendly assistant",
"fields_mapping": {
"title_field": "titleText",
"content_fields": ["contentText","contentText.0", "contentText[0]"]
}
}
}
],
"stop_sequences": null,
"temperature": 0.7,
"top_p": 0.95,
"max_tokens": 800,
"input_max_tokens": 128000
},
"augmentation": {
"augmentation_type": "none"
}
}
ElasticSearch 上的索引具有以下(简化)格式:
{
"titleText": "Here is my title",
"contentText": ["Here is a lot of text", "here is even more text"]
}
使用此配置对 ElasticSearch 的请求仅在
titleText
字段上成功搜索。包含所有信息的 contentText
永远不会映射到响应中。
我的猜测是,这是因为
contentText
是字符串数组而不是字符串,并且像 contentText.0
、contentText[0]
这样的调整不起作用。
有什么想法吗?
使用 _update_by_query 创建一个将 contentText 元素连接到新字段的新字段不是更容易吗?
POST /<index>/_update_by_query
{
"query": {
"match_all": {}
},
"script": "ctx._source.contentTextConcat = ctx._source.contentText[0] + ' ; ' + ctx._source.contentText[1]"
}
之后,您可以添加到
content_fields
contentTextConcat
来代替
"schema": 1.1,
"description": "testartu",
"type": "completion",
"completion": {
"model": "gpt-4o-mini",
"completion_type": "chat",
"include_history": false,
"include_input": true,
"data_sources": [
{
"type": "elasticsearch",
"parameters": {
"endpoint": "https://mytest.es.westeurope.azure.elastic-cloud.com",
"index_name": "myindex",
"authentication": {
"type": "encoded_api_key",
"encoded_api_key": "xxxxxx"
},
"query_type": "simple",
"roleInformation": "You are a friendly assistant",
"fields_mapping": {
"title_field": "titleText",
"content_fields": ["contentTextConcat"]
}
}
}
],
"stop_sequences": null,
"temperature": 0.7,
"top_p": 0.95,
"max_tokens": 800,
"input_max_tokens": 128000
},
"augmentation": {
"augmentation_type": "none"
}
}