使用 Kubernetes 时 Elasticsearch 上的数据丢失 - 索引被自动删除和创建

Question

设置

我通过 Google Kubernetes Engine 在 Kubernetes 上使用 elasticsearch:7.9.3。 Elasticsearch 数据使用 20GB 的 PersistentVolumeClaim 进行持久化。我通过删除并重新创建 Elasticsearch 部署并检查数据是否仍然可用来测试 PersistentVolumeClaim 是否已正确设置 - 确实如此。

Elasticsearch 是以一种最小的方式设置的，所以它本身就不需要所有额外的东西，比如自动缩放、Kibana（在我的系统上本地且独立地设置）等等。部署.yaml 如下所示：

apiVersion: apps/v1
kind: Deployment
metadata:
  name: elasticsearch-db
spec:
  selector:
    matchLabels:
      app: elasticsearch-db
      tier: elastic
  template:
    metadata:
      labels:
        app: elasticsearch-db
        tier: elastic
    spec:
      terminationGracePeriodSeconds: 300
      initContainers:
        # NOTE:
        # This is to fix the permission on the volume
        # By default elasticsearch container is not run as
        # non root user.
        # https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#_notes_for_production_use_and_defaults
        - name: fix-the-volume-permission
          image: busybox
          command:
          - sh
          - -c
          - chown -R 1000:1000 /usr/share/elasticsearch/data
          securityContext:
            privileged: true
          
        
          volumeMounts:
          - name: elasticsearch-db-storage
            mountPath: /usr/share/elasticsearch/data
        # NOTE:
        # To increase the default vm.max_map_count to 262144
        # https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#docker-cli-run-prod-mode
        - name: increase-the-vm-max-map-count
          image: busybox
          command:
          - sysctl
          - -w
          - vm.max_map_count=262144
          securityContext:
            privileged: true
        # To increase the ulimit
        # https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#_notes_for_production_use_and_defaults
        - name: increase-the-ulimit
          image: busybox
          command:
          - sh
          - -c
          - ulimit -n 65536
          securityContext:
            privileged: true
      containers:
      - image: elasticsearch:7.9.3
        name: elasticsearch-db
        ports:
          - name: elk-rest-port
            containerPort: 9200
          - name: elk-nodes-port
            containerPort: 9300
        env:
          - name: discovery.type
            value: single-node
          - name: ES_JAVA_OPTS
            value: -Xms2g -Xmx2g


        volumeMounts:
          - mountPath: /usr/share/elasticsearch/data
            name: elasticsearch-db-storage
      volumes:
        - name: elasticsearch-db-storage
          persistentVolumeClaim:
            claimName: elasticsearch-db-storage-claim

---

apiVersion: v1
kind: Service
metadata:
  name: elasticsearch-db
spec:
  selector:
    app: elasticsearch-db
    tier: elastic
  ports:
    - name: elk-rest-port
      port: 9200
      targetPort: 9200
    - name: elk-nodes-port
      port: 9300
      targetPort: 9300
  type: LoadBalancer


---

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: elasticsearch-db-storage-claim
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi

如果我没记错的话，这应该默认为 1 个副本。我注意到有 7 个被逐出的 Pod，其中一个正在运行的 Pod 已重新启动两次。

问题

我的索引是手动创建的，有两个进程通过 python 的 elasticsearch 库将大约 5 到 8 GB 的数据注入到这个系统中。然而不知何故，这些数据丢失了。一些最近推送的数据仍然可用，但是这些数据可能是在问题发生后推送到服务器上的。

最可能的原因

我在日志中找到了这个。

2021-01-07 17:45:41.937 CET "SSL/TLS request received but SSL/TLS is not enabled on this node, got (16,3,1,0), [Netty4TcpChannel{localAddress=/10.32.6.14:9300, remoteAddress=/10.32.6.1:58768}], closing connection"
2021-01-08 00:38:00.604 CET "[.async-search/fQ0TsMW-TaKyYA3qpisrMg] deleting index"
2021-01-08 00:38:00.904 CET "[myindex/dreRruz0TxaQJsSi8U_BOw] deleting index"
2021-01-08 00:38:01.254 CET "[read_me/kMNmyfHoT4KAZyajnWLg2A] deleting index"
2021-01-08 00:38:01.524 CET "[read_me] creating index, cause [api], templates [], shards [1]/[1]"
2021-01-08 00:38:01.904 CET "[read_me/LaGFr3GcR4Gy-by2LUaiaw] create_mapping [_doc]"
2021-01-08 00:38:19.811 CET "[myindex] creating index, cause [auto(bulk api)], templates [], shards [1]/[1]"
2021-01-08 00:38:20.071 CET "[myindex/iQHwXEgKQb6F2Ur8JfFNIw] create_mapping [_doc]"
2021-01-08 02:30:00.002 CET "starting SLM retention snapshot cleanup task"
2021-01-08 02:30:00.003 CET "there are no repositories to fetch, SLM retention snapshot cleanup task complete"
2021-01-08 02:38:00.004 CET "triggering scheduled [ML] maintenance tasks"
2021-01-08 02:38:00.004 CET "Deleting expired data"
2021-01-08 02:38:00.005 CET "Completed deletion of expired ML data" 
2021-01-08 02:38:00.006 CET "Successfully completed [ML] maintenance tasks"
2021-01-08 04:14:35.547 CET "[.async-search] creating index, cause [api], templates [], shards [1]/[1]"
2021-01-08 04:14:35.553 CET "updating number_of_replicas to [0] for indices [.async-search]"

我的理解是，删除索引也会删除数据（即使有办法恢复它，这不是我关心的）。

在研究这个问题时，我遇到了针对 Elasticsearch 数据库的 meow 攻击。这种攻击本质上是删除索引并创建以 -meow 结尾的随机字符串索引（因为猫喜欢丢东西，在本例中是数据库表）。

green  open .kibana-event-log-7.9.3-000001 ulnmulwTSzmZ2vi6FY6NGg 1 0     4     0  21.6kb  21.6kb
yellow open read_me                        LaGFr3GcR4Gy-by2LUaiaw 1 1     1     0   4.9kb   4.9kb
green  open .kibana_task_manager_1         Q_ud7vO2RN6ImILgNwS4iQ 1 0     6 14071   1.4mb   1.4mb
green  open .async-search                  maKtb69bS-WCQQSTTOTJ4Q 1 0     0     0   3.3kb   3.3kb
green  open .kibana_1                      XqDgNGuJTzyDeVLyVaM_eQ 1 0    45     1 546.4kb 546.4kb
yellow open myindex                        iQHwXEgKQb6F2Ur8JfFNIw 1 1 30229 25118    27mb    27mb

我当前的索引看起来有点相似，但我有随机字符串，并且没有喵喵声结束。这可能是该攻击的变体吗？诚然，在圣诞节假期之前运行它比正确确保它更重要，尽管这就是我现在正在做的事情。

如果这确实是问题，那么只需打开身份验证就可以解决此问题，对吗？

潜在的其他原因

当我将数据推送到服务器一两周时，我在推送端收到了此错误：

elasticsearch.exceptions.TransportError：TransportError（429， 'cluster_block_exception', '索引 [myindex] 被以下内容阻止： [TOO_MANY_REQUESTS/12/磁盘使用量超出洪水阶段水位线，索引有只读允许删除块];')

该错误似乎有几个可能的原因，其中之一是耗尽了所有分配的存储空间，但我的理解是，这应该使数据只读，而不是删除大部分或全部数据。所以这可能是也可能不是我的问题的原因。同样，通过查看日志，我也在服务器上看到了其中一些警告。

有几个 java.lang.OutOfMemoryError 并且通常还有很多堆栈跟踪。我相信这触发了 Elasticsearch 的重新启动，但可能不会导致问题。

Answer 1

我发现这个话题很老了，没有答案。我想留下一些关于 Elasticsearch 勒索攻击的笔记。我在你的索引中看到了 read_me。与任何其他勒索攻击一样，它有 1 个项目如果您阅读该索引，您很可能会看到赎金和 BTC 地址。我从不建议花钱取回您的数据。

查询read_me索引：

curl -X GET "http://{YOUR_IP}:9200/read_me/_search?pretty"

您很可能会看到这种消息：

{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "read_me",
        "_id" : "1",
        "_score" : 1.0,
        "_ignored" : [
          "message.keyword"
        ],
        "_source" : {
          "message" : "All your data is backed up. You must pay xxxxx BTC to 
                      xxxxxxxxxxxxxxxxxxxx In 48 hours,your data will be publicly
                      disclosed and deleted. (more information: go to 
                      https://xxxxx/xxxxxx)After paying send mail to us: x
                      [email protected] and we will provide a link for you to 
                      download your data. Your DBCODE is: XXXX"
        }
      }
    ]
  }
}

如果消息内容相似，则属于勒索攻击。你对喵攻击的怀疑并没有错。始终关闭对数据库的免费访问。黑客运行机器人来搜索具有 ~9200 端口的服务器。如果它没有密码保护，他们会留下此消息并使您的数据无法使用。

始终为您的elasticsearch实例启用密码安全
如果您不知道自己在做什么，切勿将您的 Elasticsearch 实例暴露在互联网上。
将您的用户名更改为elasticsearch。
更改 Elasticsearch 实例的默认端口。（仅减少勒索软件的机会）

使用 Kubernetes 时 Elasticsearch 上的数据丢失 - 索引被自动删除和创建

问题描述投票：0回答：1

设置

问题

最可能的原因

潜在的其他原因

1个回答

最新问题

使用 Kubernetes 时 Elasticsearch 上的数据丢失 - 索引被自动删除和创建

问题描述 投票：0回答：1

设置

问题

最可能的原因

潜在的其他原因

1个回答

最新问题

问题描述投票：0回答：1