我通过 Google Kubernetes Engine 在 Kubernetes 上使用 elasticsearch:7.9.3。 Elasticsearch 数据使用 20GB 的 PersistentVolumeClaim 进行持久化。我通过删除并重新创建 Elasticsearch 部署并检查数据是否仍然可用来测试 PersistentVolumeClaim 是否已正确设置 - 确实如此。
Elasticsearch 是以一种最小的方式设置的,所以它本身就不需要所有额外的东西,比如自动缩放、Kibana(在我的系统上本地且独立地设置)等等。部署.yaml 如下所示:
apiVersion: apps/v1
kind: Deployment
metadata:
name: elasticsearch-db
spec:
selector:
matchLabels:
app: elasticsearch-db
tier: elastic
template:
metadata:
labels:
app: elasticsearch-db
tier: elastic
spec:
terminationGracePeriodSeconds: 300
initContainers:
# NOTE:
# This is to fix the permission on the volume
# By default elasticsearch container is not run as
# non root user.
# https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#_notes_for_production_use_and_defaults
- name: fix-the-volume-permission
image: busybox
command:
- sh
- -c
- chown -R 1000:1000 /usr/share/elasticsearch/data
securityContext:
privileged: true
volumeMounts:
- name: elasticsearch-db-storage
mountPath: /usr/share/elasticsearch/data
# NOTE:
# To increase the default vm.max_map_count to 262144
# https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#docker-cli-run-prod-mode
- name: increase-the-vm-max-map-count
image: busybox
command:
- sysctl
- -w
- vm.max_map_count=262144
securityContext:
privileged: true
# To increase the ulimit
# https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#_notes_for_production_use_and_defaults
- name: increase-the-ulimit
image: busybox
command:
- sh
- -c
- ulimit -n 65536
securityContext:
privileged: true
containers:
- image: elasticsearch:7.9.3
name: elasticsearch-db
ports:
- name: elk-rest-port
containerPort: 9200
- name: elk-nodes-port
containerPort: 9300
env:
- name: discovery.type
value: single-node
- name: ES_JAVA_OPTS
value: -Xms2g -Xmx2g
volumeMounts:
- mountPath: /usr/share/elasticsearch/data
name: elasticsearch-db-storage
volumes:
- name: elasticsearch-db-storage
persistentVolumeClaim:
claimName: elasticsearch-db-storage-claim
---
apiVersion: v1
kind: Service
metadata:
name: elasticsearch-db
spec:
selector:
app: elasticsearch-db
tier: elastic
ports:
- name: elk-rest-port
port: 9200
targetPort: 9200
- name: elk-nodes-port
port: 9300
targetPort: 9300
type: LoadBalancer
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: elasticsearch-db-storage-claim
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
如果我没记错的话,这应该默认为 1 个副本。我注意到有 7 个被逐出的 Pod,其中一个正在运行的 Pod 已重新启动两次。
我的索引是手动创建的,有两个进程通过 python 的 elasticsearch 库将大约 5 到 8 GB 的数据注入到这个系统中。然而不知何故,这些数据丢失了。一些最近推送的数据仍然可用,但是这些数据可能是在问题发生后推送到服务器上的。
我在日志中找到了这个。
2021-01-07 17:45:41.937 CET "SSL/TLS request received but SSL/TLS is not enabled on this node, got (16,3,1,0), [Netty4TcpChannel{localAddress=/10.32.6.14:9300, remoteAddress=/10.32.6.1:58768}], closing connection"
2021-01-08 00:38:00.604 CET "[.async-search/fQ0TsMW-TaKyYA3qpisrMg] deleting index"
2021-01-08 00:38:00.904 CET "[myindex/dreRruz0TxaQJsSi8U_BOw] deleting index"
2021-01-08 00:38:01.254 CET "[read_me/kMNmyfHoT4KAZyajnWLg2A] deleting index"
2021-01-08 00:38:01.524 CET "[read_me] creating index, cause [api], templates [], shards [1]/[1]"
2021-01-08 00:38:01.904 CET "[read_me/LaGFr3GcR4Gy-by2LUaiaw] create_mapping [_doc]"
2021-01-08 00:38:19.811 CET "[myindex] creating index, cause [auto(bulk api)], templates [], shards [1]/[1]"
2021-01-08 00:38:20.071 CET "[myindex/iQHwXEgKQb6F2Ur8JfFNIw] create_mapping [_doc]"
2021-01-08 02:30:00.002 CET "starting SLM retention snapshot cleanup task"
2021-01-08 02:30:00.003 CET "there are no repositories to fetch, SLM retention snapshot cleanup task complete"
2021-01-08 02:38:00.004 CET "triggering scheduled [ML] maintenance tasks"
2021-01-08 02:38:00.004 CET "Deleting expired data"
2021-01-08 02:38:00.005 CET "Completed deletion of expired ML data"
2021-01-08 02:38:00.006 CET "Successfully completed [ML] maintenance tasks"
2021-01-08 04:14:35.547 CET "[.async-search] creating index, cause [api], templates [], shards [1]/[1]"
2021-01-08 04:14:35.553 CET "updating number_of_replicas to [0] for indices [.async-search]"
我的理解是,删除索引也会删除数据(即使有办法恢复它,这不是我关心的)。
在研究这个问题时,我遇到了针对 Elasticsearch 数据库的 meow 攻击。这种攻击本质上是删除索引并创建以 -meow 结尾的随机字符串索引(因为猫喜欢丢东西,在本例中是数据库表)。
green open .kibana-event-log-7.9.3-000001 ulnmulwTSzmZ2vi6FY6NGg 1 0 4 0 21.6kb 21.6kb
yellow open read_me LaGFr3GcR4Gy-by2LUaiaw 1 1 1 0 4.9kb 4.9kb
green open .kibana_task_manager_1 Q_ud7vO2RN6ImILgNwS4iQ 1 0 6 14071 1.4mb 1.4mb
green open .async-search maKtb69bS-WCQQSTTOTJ4Q 1 0 0 0 3.3kb 3.3kb
green open .kibana_1 XqDgNGuJTzyDeVLyVaM_eQ 1 0 45 1 546.4kb 546.4kb
yellow open myindex iQHwXEgKQb6F2Ur8JfFNIw 1 1 30229 25118 27mb 27mb
我当前的索引看起来有点相似,但我有随机字符串,并且没有喵喵声结束。这可能是该攻击的变体吗?诚然,在圣诞节假期之前运行它比正确确保它更重要,尽管这就是我现在正在做的事情。
如果这确实是问题,那么只需打开身份验证就可以解决此问题,对吗?
当我将数据推送到服务器一两周时,我在推送端收到了此错误:
elasticsearch.exceptions.TransportError:TransportError(429, 'cluster_block_exception', '索引 [myindex] 被以下内容阻止: [TOO_MANY_REQUESTS/12/磁盘使用量超出洪水阶段水位线,索引 有只读允许删除块];')
该错误似乎有几个可能的原因,其中之一是耗尽了所有分配的存储空间,但我的理解是,这应该使数据只读,而不是删除大部分或全部数据。所以这可能是也可能不是我的问题的原因。同样,通过查看日志,我也在服务器上看到了其中一些警告。
有几个 java.lang.OutOfMemoryError 并且通常还有很多堆栈跟踪。我相信这触发了 Elasticsearch 的重新启动,但可能不会导致问题。
我发现这个话题很老了,没有答案。我想留下一些关于 Elasticsearch 勒索攻击的笔记。我在你的索引中看到了 read_me。与任何其他勒索攻击一样,它有 1 个项目如果您阅读该索引,您很可能会看到赎金和 BTC 地址。我从不建议花钱取回您的数据。
查询read_me索引:
curl -X GET "http://{YOUR_IP}:9200/read_me/_search?pretty"
您很可能会看到这种消息:
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "read_me",
"_id" : "1",
"_score" : 1.0,
"_ignored" : [
"message.keyword"
],
"_source" : {
"message" : "All your data is backed up. You must pay xxxxx BTC to
xxxxxxxxxxxxxxxxxxxx In 48 hours,your data will be publicly
disclosed and deleted. (more information: go to
https://xxxxx/xxxxxx)After paying send mail to us: x
[email protected] and we will provide a link for you to
download your data. Your DBCODE is: XXXX"
}
}
]
}
}
如果消息内容相似,则属于勒索攻击。你对喵攻击的怀疑并没有错。始终关闭对数据库的免费访问。 黑客运行机器人来搜索具有 ~9200 端口的服务器。如果它没有密码保护,他们会留下此消息并使您的数据无法使用。