如何通过Java以编程方式在天蓝色搜索中设置认知搜索功能（带有OCR）？]

Question

我想在我的应用程序中提供全文搜索功能，因此我试图配置具有认知搜索功能的Azure搜索，以便可以索引存储在Azure Blob存储中的图像以及非图像文档。但是，在使用Azure Search的REST API通过Java代码配置Azure Search时，我无法将OCR功能利用到Azure Search中，并且图像文档也未建立索引。通过Java代码（使用Azure Search REST API）配置Azure搜索时，我缺少一些配置详细信息。

案例1：我可以通过Azure门户

使用认知功能（包括OCR技能组），索引，索引器和Azure Blob存储配置Azure搜索。>>
用于索引图像和非图像文档，例如pdf，png，jpg，xls等
搜索索引文件

案例2：我可以使用Azure REST API从Java代码中

使用认知功能，索引，索引器和Azure Blob存储配置Azure搜索。

用于索引非图像文档，例如pdf，xls等
搜索索引文件但是，在使用Azure Search的REST API通过Java代码配置Azure Search时（在情况2中），我无法将OCR功能利用到Azure Search中，并且图像文档也未建立索引。通过Java代码（使用Azure Search REST API）配置Azure搜索时，我缺少一些配置详细信息。

我正在使用Java代码中的以下示例Azure Search Rest API1.https://%s.search.windows.net/datasources?api-version=%s2.https://%s.search.windows.net/skillsets/cog-search-demo-ss?api-version=%s3.https://%s.search.windows.net/indexes/%s?api-version=%s4. https://%s.search.windows.net/indexers?api-version=%s

配置json：1. datasource.json

{
   "name" : "csstoragetest",
    "type" : "azureblob",
    "credentials" : { "connectionString" : "connectionString" },
    "container" : { "name" : "csblob" }
}
skillset.json

{
   "description": "Extract text from images and merge with content text to produce merged_text",
  "skills":
  [
    {
      "description": "Extract text (plain and structured) from image.",
      "@odata.type": "#Microsoft.Skills.Vision.OcrSkill",
      "context": "/document/normalized_images/*",
      "defaultLanguageCode": "null",
      "detectOrientation": true,
      "inputs": [
        {
          "name": "image",
          "source": "/document/normalized_images/*"
        }
      ],
      "outputs": [
        {
          "name": "text",
          "targetName": "myText"
        },
        {
          "name": "layoutText",
          "targetName": "myLayoutText"
        }
      ]
    },
    {
      "@odata.type": "#Microsoft.Skills.Text.MergeSkill",
      "description": "Create merged_text, which includes all the textual representation of each image inserted at the right location in the content field.",
      "context": "/document",
      "insertPreTag": " ",
      "insertPostTag": " ",
      "inputs": [
        {
          "name":"text", "source": "/document/content"
        },
        {
          "name": "itemsToInsert", "source": "/document/normalized_images/*/text"
        },
        {
          "name":"offsets", "source": "/document/normalized_images/*/contentOffset"
        }
      ],
      "outputs": [
        {
          "name": "mergedText", "targetName" : "merged_text"
        }
      ]
    }
  ]
}
index.json

{
  "name": "azureblob-indexing",
  "fields": [
    { "name": "id", "type": "Edm.String", "key": true, "searchable": false },
    { "name": "content", "type": "Edm.String", "searchable": true, "filterable": false, "sortable": false, "facetable": false }
  ]
}
indexer.json

{ "name" : "azureblob-indexing1", "dataSourceName" : "csstoragetest", "targetIndexName" : "azureblob-indexing", "schedule" : { "interval" : "PT2H" }, "skillsetName" : "cog-search-demo-ss", "parameters": { "maxFailedItems":-1, "maxFailedItemsPerBatch":-1, "configuration": { "dataToExtract": "contentAndMetadata", "imageAction":"generateNormalizedImages", "parsingMode": "default", "firstLineContainsHeaders": false, "delimitedTextDelimiter": "," } } }

通过Java代码配置Azure搜索之后，Image文档应该在Azure搜索中建立索引，并且我应该能够基于其中包含的文本来搜索它们。

我想在我的应用程序中提供全文搜索功能，所以我试图配置具有认知搜索功能的Azure搜索，以便可以对图像以及非图像进行索引...

Answer 1

尝试将默认语言代码设置为null，在skillset.json

中不加引号：

如何通过Java以编程方式在天蓝色搜索中设置认知搜索功能（带有OCR）？]

问题描述投票：0回答：1

1个回答

最新问题

如何通过Java以编程方式在天蓝色搜索中设置认知搜索功能（带有OCR）？]

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1