假设我有网页,并将它们作为文档存储在弹性搜索中。现在我想了解弹性搜索是否会将每个单词标题和内容标记化?或者我们应该在文档中定义一个键,elasticsearch 将为其提取单词并转换为该键的标记。另外,如果它对每个密钥进行标记,您可以分享弹性搜索将存储它们的示例格式吗
title: "Getting started with elastic search",
content: "Elastic search is a popular tool..."
},
{
title: "Top 10 latest techs",
content: "This blog will discuss about top 10 latest techs in market.."
}]```
Tokens: keywords mapped to document ids in which they are found
"Elastic": [1]
"Techs": [2]
Standard analyzer
POST _analyze
{
"analyzer":"standard",
"text":"Getting started with elastic search"
}
代币
{
"tokens": [
{
"token": "getting",
"start_offset": 0,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "started",
"start_offset": 8,
"end_offset": 15,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "with",
"start_offset": 16,
"end_offset": 20,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "elastic",
"start_offset": 21,
"end_offset": 28,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "search",
"start_offset": 29,
"end_offset": 35,
"type": "<ALPHANUM>",
"position": 4
}
]
}
你可以参考这个博客来了解倒排索引中实际数据是如何保存的。