我有一组JSON记录,结构如下。
{
"_root": [
{
"Text": "IMPORTANT NOTICE",
"Page": 0,
"Type": "Header3",
"Child": [
{
"Text": "IMPORTANT NOTICE FOR BUYERS",
"Page": 0,
"Type": "Header2",
"Child": [
{
"Text": "IMPORTANT NOTICE FOR SELLERS",
"Page": 0,
"Type": "Header4",
"Child": [
{
"Text": "IMPORTANT INFORMATION",
"Page": 0,
"Type": "Header5",
"Child": [
{
"Text": "THIS OFFERING IS AVAILABLE ONLY TO INVESTORS",
"Page": 0
}
]
}
]
}
]
}
]
}
],
"_text": "THIS OFFERING IS AVAILABLE ONLY TO INVESTORS"
}
{
"_root": [
{
"Text": "IMPORTANT NOTICE",
"Page": 0,
"Type": "Header2",
"Child": [
{
"Text": "IMPORTANT NOTICE FOR BUYERS",
"Page": 0,
"Type": "Header4",
"Child": [
{
"Text": "IMPORTANT NOTICE FOR SELLERS",
"Page": 0,
"Type": "Header5",
"Child": [
{
"Text": "IMPORTANT INFORMATION",
"Page": 0,
"Type": "Header6",
"Child": [
{
"Text": "THIS OFFERING IS AVAILABLE ONLY TO INVESTORS",
"Page": 0
}
]
}
]
}
]
}
]
}
],
"_text": "THIS OFFERING IS AVAILABLE ONLY TO INVESTORS"
}
{
"_root": [
{
"Text": "IMPORTANT NOTICE",
"Page": 0,
"Type": "Header1",
"Child": [
{
"Text": "IMPORTANT NOTICE FOR BUYERS",
"Page": 0,
"Type": "Header2",
"Child": [
{
"Text": "IMPORTANT NOTICE FOR SELLERS",
"Page": 0,
"Type": "Header3",
"Child": [
{
"Text": "IMPORTANT INFORMATION",
"Page": 0,
"Type": "Header4",
"Child": [
{
"Text": "THIS OFFERING IS AVAILABLE ONLY TO INVESTORS",
"Page": 0
}
]
}
]
}
]
}
]
}
],
"_text": "THIS OFFERING IS AVAILABLE ONLY TO INVESTORS"
}
我将这些记录存储在ElasticSearch中,然后我需要在每个Json记录中搜索特定的关键字文本。搜索关键字可能存在于某些 "嵌套 "的Json结构中,也可能不存在。换句话说,下面的查询会返回一个结果,但后面的查询不会返回。
{
"query": { "match": {"_root.Child.Child.Child.Child.Text" : "OFFERING" } }
}
这个不返回结果:
{
"query": { "match": {"_root.Child.Child.Child.Text" : "OFFERING" } }
}
当JSON文档的嵌套程度和关键字标识符不同时,我如何使搜索返回正确的结果?同样,在索引过程中,我没有一个固定的映射来定义每条记录。
注意:我重新发布这个问题(经过改进),因为我的同事之前也发布过类似的问题,但已经关闭了。
在你的示例文档中,第2个查询没有返回任何东西是有道理的--没有任何的 OFFERING
在该给定路径下!
意见:你的 Child
子对象都往往包含一个且只有一个子对象。所以将整个事物扁平化应该不是太困难。你仍然会保留 Type
标识符等,但您的结构将更容易处理,而且您的查询的复杂性将减少到1个匹配查询和可能的 "_root.children.Type": "Header6"
或类似...
次优方案:你可以做以下工作,直到达到最深的层次。
{
"query": {
"bool": {
"should": [
{
"match": {
"_root.Child.Text": "OFFERING"
}
},
{
"match": {
"_root.Child.Child.Text": "OFFERING"
}
},
{
"match": {
"_root.Child.Child.Child.Text": "OFFERING"
}
},
{
"match": {
"_root.Child.Child.Child.Child.Text": "OFFERING"
}
}
]
}
}
}
不如用muti match
{
"query": {
"multi_match": {
"query": "OFFERING",
"fields": ["_root.Child.Text", "_root.Child.Text.Text", "_root.Child.Text.Text.Text"]
}
}
}