elasticsearch 如何在 token 中分割文档

Question

假设我有网页，并将它们作为文档存储在弹性搜索中。现在我想了解弹性搜索是否会将每个单词标题和内容标记化？或者我们应该在文档中定义一个键，elasticsearch 将为其提取单词并转换为该键的标记。另外，如果它对每个密钥进行标记，您可以分享弹性搜索将存储它们的示例格式吗

   title: "Getting started with elastic search",
   content: "Elastic search is a popular tool..."
},
{
   title: "Top 10 latest techs",
   content: "This blog will discuss about top 10 latest techs in market.."
}]```

Tokens: keywords mapped to document ids in which they are found
"Elastic": [1]
"Techs": [2]

Answer 1

默认情况下，Elasticsearch 索引文档中显示的每个字段。随后它将您的所有数据拆分为令牌。
如果您没有为索引指定映射，默认情况下 Elasticsearch 使用标准分析器来创建令牌。
根据您的上述文档，它将使用
```
Standard analyzer
```

POST _analyze
{
  "analyzer":"standard",
  "text":"Getting started with elastic search"
}

代币

{
  "tokens": [
    {
      "token": "getting",
      "start_offset": 0,
      "end_offset": 7,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "started",
      "start_offset": 8,
      "end_offset": 15,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "with",
      "start_offset": 16,
      "end_offset": 20,
      "type": "<ALPHANUM>",
      "position": 2
    },
    {
      "token": "elastic",
      "start_offset": 21,
      "end_offset": 28,
      "type": "<ALPHANUM>",
      "position": 3
    },
    {
      "token": "search",
      "start_offset": 29,
      "end_offset": 35,
      "type": "<ALPHANUM>",
      "position": 4
    }
  ]
}

你可以参考这个博客来了解倒排索引中实际数据是如何保存的。

elasticsearch 如何在 token 中分割文档

问题描述投票：0回答：1

1个回答

最新问题

elasticsearch 如何在 token 中分割文档

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1