如何在azure AI搜索中使用Split Skill获取块索引？

Question

我是 Azure AI 搜索的新手，我想从该技能集中获取属性块索引，以了解该块位于文档中的哪个索引。分割后的页面内容如下所示

{'values': [{'recordId': '0', 'data': {'text': 'sample data 1 '}}, {'recordId': '1', 'data': {'text': 'sample data 1'}}, {'recordId': '2', 'data': {'text': 'sample data 3'}}

如何将 recordId 值复制为字段。

{
  "name": "testing-phase-1-docs-skillset",
  "description": "Skillset to chunk documents and generate embeddings",
  "skills": [
    {
      "@odata.type": "#Microsoft.Skills.Text.SplitSkill",
      "name": "#3",
      "description": "Split skill to chunk documents",
      "context": "/document",
      "inputs": [
        {
          "name": "text",
          "source": "/document/content",
          "inputs": []
        }
      ],
      "outputs": [
        {
          "name": "textItems",
          "targetName": "pages"
        }
      ],
      "defaultLanguageCode": "en",
      "textSplitMode": "pages",
      "maximumPageLength": 2000,
      "pageOverlapLength": 500,
      "unit": "characters"
    }
  ],
  "@odata.etag": "\"0x8DD029DA50735BD\"",
  "indexProjections": {
    "selectors": [
      {
        "targetIndexName": "testing-phase-1-docs-index",
        "parentKeyFieldName": "parent_id",
        "sourceContext": "/document/pages/*",
        "mappings": [
          {
            "name": "content",
            "source": "/document/pages/*"
          }, // want to add a recordId here
  
          {
            "name": "metadata_title",
            "source": "/document/metadata_title"
          }
        ]
      }
    ],
    "parameters": {
      "projectionMode": "skipIndexingParentDocuments"
    }
  }
}

Answer 1

如何在azure AI搜索中使用Split Skill获取块索引？

添加自定义技能，为每个块分配一个

chunkIndex

字段，代表其位置，通过此您可以在使用

SplitSkill

拆分后跟踪文档中的块索引。

```
chunkIndex
```
投影到搜索索引中，使您能够了解每个块在原始文档中的确切位置。

{
  "name": "testing-phase-1-docs-skillset",
  "description": "Skillset to chunk documents, assign a recordId and chunkIndex to each chunk, and generate embeddings",
  "skills": [
    {
      "@odata.type": "#Microsoft.Skills.Text.SplitSkill",
      "name": "documentChunkingSkill",
      "description": "Splits document into chunks",
      "context": "/document",
      "inputs": [
        {
          "name": "text",
          "source": "/document/content"
        }
      ],
      "outputs": [
        {
          "name": "textItems",
          "targetName": "pages"
        }
      ],
      "defaultLanguageCode": "en",
      "textSplitMode": "pages",
      "maximumPageLength": 2000,
      "pageOverlapLength": 500,
      "unit": "characters"
    },
    {
      "@odata.type": "#Microsoft.Skills.Util.DocumentExtractionSkill",
      "name": "generateRecordIdAndChunkIndexSkill",
      "description": "Generates a unique recordId and chunkIndex for each chunk",
      "context": "/document/pages/*",
      "inputs": [
        {
          "name": "text",
          "source": "/document/pages/*/text"
        }
      ],
      "outputs": [
        {
          "name": "recordId",
          "targetName": "recordId"
        },
        {
          "name": "chunkIndex",
          "targetName": "chunkIndex"
        }
      ]
    }
  ],
  "indexProjections": {
    "selectors": [
      {
        "targetIndexName": "testing-phase-1-docs-index",
        "parentKeyFieldName": "parent_id",
        "sourceContext": "/document/pages/*",
        "mappings": [
          {
            "name": "content",
            "source": "/document/pages/*/text"
          },
          {
            "name": "recordId",
            "source": "/document/pages/*/recordId"
          },
          {
            "name": "chunkIndex",
            "source": "/document/pages/*/chunkIndex"
          },
          {
            "name": "metadata_title",
            "source": "/document/metadata_title"
          }
        ]
      }
    ],
    "parameters": {
      "projectionMode": "skipIndexingParentDocuments"
    }
  }
}

在上面，我添加了一个自定义技能来检查每个块并获得一个独特的

recordId

。

它根据指定的页面长度将文档分割成块。
它为每个块生成一个唯一的
```
recordId
```
。
它将拆分内容和
```
recordId
```
映射到最终索引。

按预期为每个块生成了

recordId

。

enter image description here

如何在azure AI搜索中使用Split Skill获取块索引？

问题描述投票：0回答：1

1个回答

最新问题

如何在azure AI搜索中使用Split Skill获取块索引？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1