elasticsearch中从远程和本地群集重新索引

问题描述 投票:0回答:1

我在一个看起来像这样的远程elasticsearch集群上有“ index_a”:

{
   _index: "index_a",
   _type: "_doc",
   _id: "1",
   _score: 1,
   _source: {
      customer_id: "1234",
      customer_name: "spider",
      message: "does what ever"
   }
}, 
{
   _index: "index_a",
   _type: "_doc",
   _id: "2",
   _score: 1,
   _source: {
      customer_id: "3333",
      customer_name: "pig",
      message: "spider-pid does"
   }
}

而且我在我正在执行_reindex的当前Elasticsearch群集上也有“ index_a”(是的,名字相同!),看起来像这样:

{
   _index: "index_a",
   _type: "_doc",
   _id: "2",
   _score: 1,
   _source: {
      customer_id: "3333",
      customer_name: "pig",
      message: "spider-pid does"
   }
},
{
   _index: "index_a",
   _type: "_doc",
   _id: "3",
   _score: 1,
   _source: {
      customer_id: "9876",
      customer_name: "coronavirus",
      message: "stay safe and at home"
   }
}

您可以看到上面第一个“ index_a”中有重复文档,但是我想保留那里的新数据!

最终我想在当前的Elasticsearch集群中得到的就是这个index_b:

{
   _index: "index_b",
   _type: "_doc",
   _id: "1",
   _score: 1,
   _source: {
      customer_id: "1234",
      customer_name: "spider",
      message: "does what ever"
   }
}, 
{
   _index: "index_b",
   _type: "_doc",
   _id: "2",
   _score: 1,
   _source: {
      customer_id: "3333",
      customer_name: "pig",
      message: "spider-pid does"
   }
},
{
   _index: "index_b",
   _type: "_doc",
   _id: "3",
   _score: 1,
   _source: {
      customer_id: "9876",
      customer_name: "coronavirus",
      message: "stay safe and at home"
   }
}

所以基本上我知道事实我可以在two]_reindex请求,第一个_reindex将从远程集群index_a到当前的Elasticsearch集群index_b。第二个_reindex将从当前的弹性搜索簇index_a到当前的簇index_b。但就大数据而言,运行这两个_reindex请求是[[非常浪费,导致该请求所做的基本上是逐个运行在每个doc-id上,并写入/覆盖它。[尝试在单个_reindex请求上执行此操作时,我已经尝试过:

POST http://current_cluster/_reindex

{ "source": { "remote": { "host": "http://remote_cluster/" }, "index": ["index_a-from-remote", "index_a-of-current"] //renamed them to be more understood for you }, "dest": { "index": "index_b" } }
并且该响应表明远程集群中没有“ index_a-of-current”,这是有道理的:之所以发生,是因为构建这种类型的_reindex请求仅是为了从远程Elasticsearch集群获取索引。 

所以我的问题是:

是否有一种方法可以执行单个_reindex请求,该请求既要从远程集群中获取“ index_a”,又要从当前集群中获取“ index_a”,并在当前集群中将它们都重新索引为“ index_b”?

[如果有人在此问题上提出任何建议,我会很高兴,因为我在请求中尝试了很多其他内容,并阅读了Reindex API文档,但尚未找到答案。tnx寻求帮助!

我在如下所示的远程Elasticsearch群集上具有“ index_a”:{_index:“ index_a”,_type:“ _doc”,_id:“ 1”,_score:1,1,_source:{customer_id:“ 1234”,。 ..

elasticsearch indexing bigdata cluster-computing reindex
1个回答
1
投票
[cross-cluster search,您也许可以做您想做的事。
© www.soinside.com 2019 - 2024. All rights reserved.