如何将新文档添加到具有潜在新字段的Azure AI搜索索引

Question

我希望有一个 Azure 功能，这样每当有新文档添加到 Azure Cosmos DB 服务时，它都会自动将此新文档添加到 Azure AI 搜索服务的现有索引中，以使该新文档可搜索。

挑战在于新文档可能包含新字段，甚至索引中不存在的嵌套新字段。我的理解是，我们需要先将这些字段添加到索引中，然后再添加文档。

我的问题是是否有一种简单的方法来处理此类用例。我可以手动将新文档的字段与索引中的现有字段进行比较，并将新字段添加到索引中。然而，处理嵌套字段并使其健壮可能会很麻烦。我想已经存在一个函数可以做到这一点，因为当您第一次创建索引时，Azure 能够提取嵌套字段供您选择。

编辑：

添加功能代码：

@app.cosmos_db_trigger(arg_name="docs", 
                       container_name="xxx",
                       database_name="xxxx", 
                       lease_container_name="leases",
                       create_lease_container_if_not_exists="true",
                       connection="xxx") 
def cosmosdb_trigger(docs: func.DocumentList):
    logging.info('Python CosmosDB triggered.')
    if docs:
        logging.info(f'{len(docs)} documents modified.')

        # Initialize the search client
        service_name = 'xxx'
        index_name = 'xxx'
        search_client = SearchClient(endpoint= f'https://{service_name}.search.windows.net', 
                                     index_name=index_name, 
                                     credential=AzureKeyCredential(os.getenv('xxx')))
        logging.info(f'search_client is initiated')
        documents = [doc.to_dict() for doc in docs]

        # todo: since documents may contain fields that don't exist in the index, update the fields of index here before uploading documents.

        search_client.upload_documents(documents=documents)

        logging.info(f'Documents sent to {service_name}')

Answer 1

下面的代码将新文档添加到 Azure Cosmos DB 服务，这些文档需要自动添加到现有的 Azure AI 搜索索引，包括潜在的新字段或嵌套字段。

当 Azure Cosmos DB 集合发生更改时，触发的 Cosmos DB 功能将激活。当新文档添加到 Cosmos DB 集合时，将检索该文档。
下面的代码从文档中提取字段和嵌套字段，并将提取的字段与 Azure AI 搜索索引中的现有字段进行比较。
如果发现任何新字段或嵌套字段，它会更新 Azure AI 搜索索引架构以包含这些新字段，并将文档添加到 Azure AI 搜索索引。

使用过的套件：

azure.core

、

azure.cosmos

和

azure-search-documents

。

使用了DOC用于函数的 Azure Cosmos DB 触发器
的代码

def cosmos_db_trigger(documents):
    for document in documents:
        new_document = document['data']  
        fields = extract_fields(new_document)  
        index_name = "index"  
        search_service_endpoint = "Search service url"
        search_api_key = "admin keys" 
        search_client = SearchIndexClient(search_service_endpoint, AzureKeyCredential(search_api_key))  
        existing_fields = get_existing_fields(search_client, index_name) 
        new_fields = compare_fields(fields, existing_fields)  
        if new_fields:
            update_index_schema(search_client, index_name, new_fields) 
        add_document_to_index(search_client, index_name, new_document)

输出：

enter image description here

如何将新文档添加到具有潜在新字段的Azure AI搜索索引

问题描述投票：0回答：1

1个回答

最新问题

如何将新文档添加到具有潜在新字段的Azure AI搜索索引

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1