Redis版本:v7.0.12
你好。
我已使用
ot-helm/redis-operator
在 Kubernetes 集群中部署了一个 Redis 集群,其值如下:
redisCluster:
redisSecret:
secretName: redis-password
secretKey: REDIS_PASSWORD
leader:
replicas: 3
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: test
operator: In
values:
- "true"
follower:
replicas: 3
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: test
operator: In
values:
- "true"
externalService:
enabled: true
serviceType: LoadBalancer
port: 6379
redisExporter:
enabled: true
storageSpec:
volumeClaimTemplate:
spec:
resources:
requests:
storage: 10Gi
nodeConfVolumeClaimTemplate:
spec:
resources:
requests:
storage: 1Gi
向集群添加几个密钥后,我停止部署 Redis 集群的主机(EC2 实例),然后再次启动它。重新启动 EC2 实例和 Redis 集群后,我在重新启动之前添加的几个键就会消失。
我启用了两个持久化方法(RDB 和 AOF),这是我针对 Redis 集群有关持久化的配置(默认):
config get dir # /data
config get dbfilename # dump.rdb
config get appendonly # yes
config get appendfilename # appendonly.aof
我注意到,在 Redis 中添加键/数据期间/之后,
/data/dump.rdb
和 /data/appendonlydir/appendonly.aof.1.incr.aof
(在我的主 Redis 集群领导者中)的大小增加,但是当我重新启动 EC2 实例时,/data/dump.rdb
得到返回到 0 字节,而 /data/appendonlydir/appendonly.aof.1.incr.aof
保持与重新启动之前相同的大小。
我可以通过 Grafana 仪表板中的屏幕截图来确认这一点,同时监控附加到 Redis 集群主要领导者的持久卷。据我了解,该卷包含 AOF 和 RDB 数据,直到 Redis 集群重新启动几秒钟后,RDB 数据才会被删除。 这是我正在使用的普罗米修斯指标,以防有人想知道:
sum(kubelet_volume_stats_used_bytes{namespace=~"test", persistentvolumeclaim="redis-cluster-leader-redis-cluster-leader-0"}/(1024*1024)) by (persistentvolumeclaim)
所以,Redis Cluster 实际上是使用 RDB 和 AOF 来备份数据,但是一旦重新启动(EC2 重新启动后),它就会丢失 RDB 数据,而 AOF 由于某种原因不足以检索键/数据.
以下是Redis集群重启时的日志:
ACL_MODE is not true, skipping ACL file modification
Starting redis service in cluster mode.....
12:C 17 Sep 2024 00:49:39.351 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
12:C 17 Sep 2024 00:49:39.351 # Redis version=7.0.12, bits=64, commit=00000000, modified=0, pid=12, just started
12:C 17 Sep 2024 00:49:39.351 # Configuration loaded
12:M 17 Sep 2024 00:49:39.352 * monotonic clock: POSIX clock_gettime
12:M 17 Sep 2024 00:49:39.353 * Node configuration loaded, I'm ef200bc9befd1c4fb0f6e5acbb1432002a7c2822
12:M 17 Sep 2024 00:49:39.353 * Running mode=cluster, port=6379.
12:M 17 Sep 2024 00:49:39.353 # Server initialized
12:M 17 Sep 2024 00:49:39.355 * Reading RDB base file on AOF loading...
12:M 17 Sep 2024 00:49:39.355 * Loading RDB produced by version 7.0.12
12:M 17 Sep 2024 00:49:39.355 * RDB age 2469 seconds
12:M 17 Sep 2024 00:49:39.355 * RDB memory usage when created 1.51 Mb
12:M 17 Sep 2024 00:49:39.355 * RDB is base AOF
12:M 17 Sep 2024 00:49:39.355 * Done loading RDB, keys loaded: 0, keys expired: 0.
12:M 17 Sep 2024 00:49:39.355 * DB loaded from base file appendonly.aof.1.base.rdb: 0.001 seconds
12:M 17 Sep 2024 00:49:39.598 * DB loaded from incr file appendonly.aof.1.incr.aof: 0.243 seconds
12:M 17 Sep 2024 00:49:39.598 * DB loaded from append only file: 0.244 seconds
12:M 17 Sep 2024 00:49:39.598 * Opening AOF incr file appendonly.aof.1.incr.aof on server start
12:M 17 Sep 2024 00:49:39.599 * Ready to accept connections
12:M 17 Sep 2024 00:49:41.611 # Cluster state changed: ok
12:M 17 Sep 2024 00:49:46.592 # Cluster state changed: fail
12:M 17 Sep 2024 00:50:02.258 * DB saved on disk
12:M 17 Sep 2024 00:50:21.376 # Cluster state changed: ok
12:M 17 Sep 2024 00:51:26.284 * Replica 192.168.58.43:6379 asks for synchronization
12:M 17 Sep 2024 00:51:26.284 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '995d7ac6eedc09d95c4fc184519686e9dc8f9b41', my replication IDs are '654e768d51433cc24667323f8f884c66e8e55566' and '0000000000000000000000000000000000000000')
12:M 17 Sep 2024 00:51:26.284 * Replication backlog created, my new replication IDs are 'de979d9aa433bf37f413a64aff751ed677794b00' and '0000000000000000000000000000000000000000'
12:M 17 Sep 2024 00:51:26.284 * Delay next BGSAVE for diskless SYNC
12:M 17 Sep 2024 00:51:31.195 * Starting BGSAVE for SYNC with target: replicas sockets
12:M 17 Sep 2024 00:51:31.195 * Background RDB transfer started by pid 218
218:C 17 Sep 2024 00:51:31.196 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB
12:M 17 Sep 2024 00:51:31.196 # Diskless rdb transfer, done reading from pipe, 1 replicas still up.
12:M 17 Sep 2024 00:51:31.202 * Background RDB transfer terminated with success
12:M 17 Sep 2024 00:51:31.202 * Streamed RDB transfer with replica 192.168.58.43:6379 succeeded (socket). Waiting for REPLCONF ACK from slave to enable streaming
12:M 17 Sep 2024 00:51:31.203 * Synchronization with replica 192.168.58.43:6379 succeeded
这是
INFO PERSISTENCE
redis-cli 命令的输出,添加一些数据后:
# Persistence
loading:0
async_loading:0
current_cow_peak:0
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:0
rdb_bgsave_in_progress:0
rdb_last_save_time:1726552373
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:0
rdb_current_bgsave_time_sec:-1
rdb_saves:5
rdb_last_cow_size:1093632
rdb_last_load_keys_expired:0
rdb_last_load_keys_loaded:0
aof_enabled:1
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_rewrites:0
aof_rewrites_consecutive_failures:0
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0
aof_current_size:37092089
aof_base_size:89
aof_pending_rewrite:0
aof_buffer_length:0
aof_pending_bio_fsync:0
aof_delayed_fsync:0
如果有人想知道,持久卷已正确附加到
/data
安装路径中的 Redis 集群。以下是主要 Redis 集群领导者的 YAML 定义的片段(这是通过 Helm & Redis Operator 自动生成的):
apiVersion: v1
kind: Pod
metadata:
name: redis-cluster-leader-0
namespace: test
[...]
spec:
containers:
[...]
volumeMounts:
- mountPath: /node-conf
name: node-conf
- mountPath: /data
name: redis-cluster-leader
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-7ds8c
readOnly: true
[...]
volumes:
- name: node-conf
persistentVolumeClaim:
claimName: node-conf-redis-cluster-leader-0
- name: redis-cluster-leader
persistentVolumeClaim:
claimName: redis-cluster-leader-redis-cluster-leader-0
[...]
我已经在这个问题上花了几天时间,我到处寻找,但徒劳无功。我将不胜感激任何形式的帮助。如果需要任何其他信息,我也将随时为您服务。非常感谢。
我注意到
/data/appendonlydir/appendonly.aof.1.incr.aof
最后包含 FLUSHALL
命令,因此,我通过将 rename-command FLUSHALL ""
添加到我的 redis.conf
中解决了这个问题。