elasticsearch 如何在 token 中分割文档

问题描述 投票:0回答:1

假设我有网页,并将它们作为文档存储在弹性搜索中。现在我想了解弹性搜索是否会将每个单词标题和内容标记化?或者我们应该在文档中定义一个键,elasticsearch 将为其提取单词并转换为该键的标记。另外,如果它对每个密钥进行标记,您可以分享弹性搜索将存储它们的示例格式吗

   title: "Getting started with elastic search",
   content: "Elastic search is a popular tool..."
},
{
   title: "Top 10 latest techs",
   content: "This blog will discuss about top 10 latest techs in market.."
}]```

Tokens: keywords mapped to document ids in which they are found
"Elastic": [1]
"Techs": [2]
elasticsearch tokenize
1个回答
1
投票
  1. 默认情况下,Elasticsearch 索引文档中显示的每个字段。随后它将您的所有数据拆分为令牌。
  2. 如果您没有为索引指定映射,默认情况下 Elasticsearch 使用 标准分析器 来创建令牌。
  3. 根据您的上述文档,它将使用
    Standard analyzer
  4. 创建以下令牌
POST _analyze
{
  "analyzer":"standard",
  "text":"Getting started with elastic search"
}

代币

{
  "tokens": [
    {
      "token": "getting",
      "start_offset": 0,
      "end_offset": 7,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "started",
      "start_offset": 8,
      "end_offset": 15,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "with",
      "start_offset": 16,
      "end_offset": 20,
      "type": "<ALPHANUM>",
      "position": 2
    },
    {
      "token": "elastic",
      "start_offset": 21,
      "end_offset": 28,
      "type": "<ALPHANUM>",
      "position": 3
    },
    {
      "token": "search",
      "start_offset": 29,
      "end_offset": 35,
      "type": "<ALPHANUM>",
      "position": 4
    }
  ]
}

你可以参考这个博客来了解倒排索引中实际数据是如何保存的。

© www.soinside.com 2019 - 2024. All rights reserved.