如何通过Java以编程方式在天蓝色搜索中设置认知搜索功能(带有OCR)?]

问题描述 投票:0回答:1

我想在我的应用程序中提供全文搜索功能,因此我试图配置具有认知搜索功能的Azure搜索,以便可以索引存储在Azure Blob存储中的图像以及非图像文档。但是,在使用Azure Search的REST API通过Java代码配置Azure Search时,我无法将OCR功能利用到Azure Search中,并且图像文档也未建立索引。通过Java代码(使用Azure Search REST API)配置Azure搜索时,我缺少一些配置详细信息。

案例1:我可以通过Azure门户

  1. 使用认知功能(包括OCR技能组),索引,索引器和Azure Blob存储配置Azure搜索。>>
  2. 用于索引图像和非图像文档,例如pdf,png,jpg,xls等
  3. 搜索索引文件
  4. 案例2:我可以使用Azure REST API从Java代码中

    1. 使用认知功能,索引,索引器和Azure Blob存储配置Azure搜索。
  5. 用于索引非图像文档,例如pdf,xls等
  6. 搜索索引文件但是,在使用Azure Search的REST API通过Java代码配置Azure Search时(在情况2中),我无法将OCR功能利用到Azure Search中,并且图像文档也未建立索引。通过Java代码(使用Azure Search REST API)配置Azure搜索时,我缺少一些配置详细信息。
  7. 我正在使用Java代码中的以下示例Azure Search Rest API1.https://%s.search.windows.net/datasources?api-version=%s2.https://%s.search.windows.net/skillsets/cog-search-demo-ss?api-version=%s3.https://%s.search.windows.net/indexes/%s?api-version=%s4. https://%s.search.windows.net/indexers?api-version=%s

    配置json:1. datasource.json

{
   "name" : "csstoragetest",
    "type" : "azureblob",
    "credentials" : { "connectionString" : "connectionString" },
    "container" : { "name" : "csblob" }
}
  1. skillset.json
{
   "description": "Extract text from images and merge with content text to produce merged_text",
  "skills":
  [
    {
      "description": "Extract text (plain and structured) from image.",
      "@odata.type": "#Microsoft.Skills.Vision.OcrSkill",
      "context": "/document/normalized_images/*",
      "defaultLanguageCode": "null",
      "detectOrientation": true,
      "inputs": [
        {
          "name": "image",
          "source": "/document/normalized_images/*"
        }
      ],
      "outputs": [
        {
          "name": "text",
          "targetName": "myText"
        },
        {
          "name": "layoutText",
          "targetName": "myLayoutText"
        }
      ]
    },
    {
      "@odata.type": "#Microsoft.Skills.Text.MergeSkill",
      "description": "Create merged_text, which includes all the textual representation of each image inserted at the right location in the content field.",
      "context": "/document",
      "insertPreTag": " ",
      "insertPostTag": " ",
      "inputs": [
        {
          "name":"text", "source": "/document/content"
        },
        {
          "name": "itemsToInsert", "source": "/document/normalized_images/*/text"
        },
        {
          "name":"offsets", "source": "/document/normalized_images/*/contentOffset"
        }
      ],
      "outputs": [
        {
          "name": "mergedText", "targetName" : "merged_text"
        }
      ]
    }
  ]
}
  1. index.json
{
  "name": "azureblob-indexing",
  "fields": [
    { "name": "id", "type": "Edm.String", "key": true, "searchable": false },
    { "name": "content", "type": "Edm.String", "searchable": true, "filterable": false, "sortable": false, "facetable": false }
  ]
}
  1. indexer.json
{
    "name" : "azureblob-indexing1",
  "dataSourceName" : "csstoragetest",
  "targetIndexName" : "azureblob-indexing",
  "schedule" : { "interval" : "PT2H" },
  "skillsetName" : "cog-search-demo-ss",
  "parameters":
  {
    "maxFailedItems":-1,
    "maxFailedItemsPerBatch":-1,
    "configuration":
    {
      "dataToExtract": "contentAndMetadata",
      "imageAction":"generateNormalizedImages",
      "parsingMode": "default",
      "firstLineContainsHeaders": false,
      "delimitedTextDelimiter": ","
    }
  }
}

通过Java代码配置Azure搜索之后,Image文档应该在Azure搜索中建立索引,并且我应该能够基于其中包含的文本来搜索它们。

我想在我的应用程序中提供全文搜索功能,所以我试图配置具有认知搜索功能的Azure搜索,以便可以对图像以及非图像进行索引...

azure search
1个回答
0
投票

尝试将默认语言代码设置为null,在skillset.json

中不加引号:
© www.soinside.com 2019 - 2024. All rights reserved.