Redis集群在主机重启后无法恢复之前持久化的数据

问题描述 投票:0回答:1

Redis版本:v7.0.12

你好。

我已使用

ot-helm/redis-operator
在 Kubernetes 集群中部署了一个 Redis 集群,其值如下:

redisCluster:
  redisSecret:
    secretName: redis-password
    secretKey: REDIS_PASSWORD
  leader:
    replicas: 3
    affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: test
                    operator: In
                    values:
                      - "true"
  follower:
    replicas: 3
    affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: test
                    operator: In
                    values:
                      - "true"
externalService:
  enabled: true
  serviceType: LoadBalancer
  port: 6379
redisExporter:
  enabled: true
storageSpec:
  volumeClaimTemplate:
    spec:
      resources:
        requests:
          storage: 10Gi
  nodeConfVolumeClaimTemplate:
    spec:
      resources:
        requests:
          storage: 1Gi

向集群添加几个密钥后,我停止部署 Redis 集群的主机(EC2 实例),然后再次启动它。重新启动 EC2 实例和 Redis 集群后,我在重新启动之前添加的几个键就会消失。

我启用了两个持久化方法(RDB 和 AOF),这是我针对 Redis 集群有关持久化的配置(默认):

config get dir # /data
config get dbfilename # dump.rdb
config get appendonly # yes
config get appendfilename # appendonly.aof

我注意到,在 Redis 中添加键/数据期间/之后,

/data/dump.rdb
/data/appendonlydir/appendonly.aof.1.incr.aof
(在我的主 Redis 集群领导者中)的大小增加,但是当我重新启动 EC2 实例时,
/data/dump.rdb
得到返回到 0 字节,而
/data/appendonlydir/appendonly.aof.1.incr.aof
保持与重新启动之前相同的大小。

我可以通过 Grafana 仪表板中的屏幕截图来确认这一点,同时监控附加到 Redis 集群主要领导者的持久卷。据我了解,该卷包含 AOF 和 RDB 数据,直到 Redis 集群重新启动几秒钟后,RDB 数据才会被删除。 enter image description here 这是我正在使用的普罗米修斯指标,以防有人想知道:

sum(kubelet_volume_stats_used_bytes{namespace=~"test", persistentvolumeclaim="redis-cluster-leader-redis-cluster-leader-0"}/(1024*1024)) by (persistentvolumeclaim)

所以,Redis Cluster 实际上是使用 RDB 和 AOF 来备份数据,但是一旦重新启动(EC2 重新启动后),它就会丢失 RDB 数据,而 AOF 由于某种原因不足以检索键/数据.

以下是Redis集群重启时的日志:

ACL_MODE is not true, skipping ACL file modification
Starting redis service in cluster mode.....
12:C 17 Sep 2024 00:49:39.351 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
12:C 17 Sep 2024 00:49:39.351 # Redis version=7.0.12, bits=64, commit=00000000, modified=0, pid=12, just started
12:C 17 Sep 2024 00:49:39.351 # Configuration loaded
12:M 17 Sep 2024 00:49:39.352 * monotonic clock: POSIX clock_gettime
12:M 17 Sep 2024 00:49:39.353 * Node configuration loaded, I'm ef200bc9befd1c4fb0f6e5acbb1432002a7c2822
12:M 17 Sep 2024 00:49:39.353 * Running mode=cluster, port=6379.
12:M 17 Sep 2024 00:49:39.353 # Server initialized
12:M 17 Sep 2024 00:49:39.355 * Reading RDB base file on AOF loading...
12:M 17 Sep 2024 00:49:39.355 * Loading RDB produced by version 7.0.12
12:M 17 Sep 2024 00:49:39.355 * RDB age 2469 seconds
12:M 17 Sep 2024 00:49:39.355 * RDB memory usage when created 1.51 Mb
12:M 17 Sep 2024 00:49:39.355 * RDB is base AOF
12:M 17 Sep 2024 00:49:39.355 * Done loading RDB, keys loaded: 0, keys expired: 0.
12:M 17 Sep 2024 00:49:39.355 * DB loaded from base file appendonly.aof.1.base.rdb: 0.001 seconds
12:M 17 Sep 2024 00:49:39.598 * DB loaded from incr file appendonly.aof.1.incr.aof: 0.243 seconds
12:M 17 Sep 2024 00:49:39.598 * DB loaded from append only file: 0.244 seconds
12:M 17 Sep 2024 00:49:39.598 * Opening AOF incr file appendonly.aof.1.incr.aof on server start
12:M 17 Sep 2024 00:49:39.599 * Ready to accept connections
12:M 17 Sep 2024 00:49:41.611 # Cluster state changed: ok
12:M 17 Sep 2024 00:49:46.592 # Cluster state changed: fail
12:M 17 Sep 2024 00:50:02.258 * DB saved on disk
12:M 17 Sep 2024 00:50:21.376 # Cluster state changed: ok
12:M 17 Sep 2024 00:51:26.284 * Replica 192.168.58.43:6379 asks for synchronization
12:M 17 Sep 2024 00:51:26.284 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '995d7ac6eedc09d95c4fc184519686e9dc8f9b41', my replication IDs are '654e768d51433cc24667323f8f884c66e8e55566' and '0000000000000000000000000000000000000000')
12:M 17 Sep 2024 00:51:26.284 * Replication backlog created, my new replication IDs are 'de979d9aa433bf37f413a64aff751ed677794b00' and '0000000000000000000000000000000000000000'
12:M 17 Sep 2024 00:51:26.284 * Delay next BGSAVE for diskless SYNC
12:M 17 Sep 2024 00:51:31.195 * Starting BGSAVE for SYNC with target: replicas sockets
12:M 17 Sep 2024 00:51:31.195 * Background RDB transfer started by pid 218
218:C 17 Sep 2024 00:51:31.196 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB
12:M 17 Sep 2024 00:51:31.196 # Diskless rdb transfer, done reading from pipe, 1 replicas still up.
12:M 17 Sep 2024 00:51:31.202 * Background RDB transfer terminated with success
12:M 17 Sep 2024 00:51:31.202 * Streamed RDB transfer with replica 192.168.58.43:6379 succeeded (socket). Waiting for REPLCONF ACK from slave to enable streaming
12:M 17 Sep 2024 00:51:31.203 * Synchronization with replica 192.168.58.43:6379 succeeded

这是

INFO PERSISTENCE
redis-cli 命令的输出,添加一些数据后:

# Persistence
loading:0
async_loading:0
current_cow_peak:0
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:0
rdb_bgsave_in_progress:0
rdb_last_save_time:1726552373
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:0
rdb_current_bgsave_time_sec:-1
rdb_saves:5
rdb_last_cow_size:1093632
rdb_last_load_keys_expired:0
rdb_last_load_keys_loaded:0
aof_enabled:1
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_rewrites:0
aof_rewrites_consecutive_failures:0
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0
aof_current_size:37092089
aof_base_size:89
aof_pending_rewrite:0
aof_buffer_length:0
aof_pending_bio_fsync:0
aof_delayed_fsync:0

如果有人想知道,持久卷已正确附加到

/data
安装路径中的 Redis 集群。以下是主要 Redis 集群领导者的 YAML 定义的片段(这是通过 Helm & Redis Operator 自动生成的):

apiVersion: v1
kind: Pod
metadata:
  name: redis-cluster-leader-0
  namespace: test
[...]
spec:
  containers:
    [...]
    volumeMounts:
    - mountPath: /node-conf
      name: node-conf
    - mountPath: /data
      name: redis-cluster-leader
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-7ds8c
      readOnly: true
  [...]
  volumes:
  - name: node-conf
    persistentVolumeClaim:
      claimName: node-conf-redis-cluster-leader-0
  - name: redis-cluster-leader
    persistentVolumeClaim:
      claimName: redis-cluster-leader-redis-cluster-leader-0
[...]

我已经在这个问题上花了几天时间,我到处寻找,但徒劳无功。我将不胜感激任何形式的帮助。如果需要任何其他信息,我也将随时为您服务。非常感谢。

redis persistence redis-cluster
1个回答
0
投票

我注意到

/data/appendonlydir/appendonly.aof.1.incr.aof
最后包含
FLUSHALL
命令,因此,我通过将
rename-command FLUSHALL ""
添加到我的
redis.conf
中解决了这个问题。

© www.soinside.com 2019 - 2024. All rights reserved.