Rancher 集群创建因等待集群代理连接而陷入困境

问题描述 投票:0回答:1

我正在尝试使用 docker 安装选项设置 Rancher 集群。
牧场主版本:2.8.3
服务器版本:RedHat 8.9

在服务器上设置 podman 后,我启动了 rancher/rancher 容器。
然后使用默认设置创建一个新集群。

如果随后使用注册命令添加节点,该节点会出现在仪表板中,但仍停留在等待集群代理注册的状态。

在要注册的节点上的 rancher-system-agent 日志中,总是显示该密钥太旧。

集群配置日志

[INFO ] waiting for at least one control plane, etcd, and worker node to be registered
[INFO ] configuring bootstrap node(s) custom-8b637c9cd841: waiting for bootstrap etcd to be available
[INFO ] configuring bootstrap node(s) custom-8b637c9cd841: waiting for agent to check in and apply initial plan
[INFO ] configuring bootstrap node(s) custom-8b637c9cd841: waiting for probes: calico, etcd, kube-apiserver, kube-controller-manager, kube-scheduler, kubelet
[INFO ] configuring bootstrap node(s) custom-8b637c9cd841: waiting for probes: calico, kube-apiserver, kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) custom-8b637c9cd841: waiting for probes: calico
[INFO ] configuring bootstrap node(s) custom-8b637c9cd841: waiting for cluster agent to connect

Rancher-系统-代理日志

rancher-system-agent[13135]: time="2024-04-30T14:27:40+02:00" level=info msg="[Applyinator] Running command: sh [-c rke2 etcd-snapshot list --etcd-s3=false 2>/dev/null]"
rancher-system-agent[13135]: time="2024-04-30T14:27:40+02:00" level=info msg="[12cb26ca54456920e77a29191a815c3d6fed7af28c7a8c072a4d5c214607e6fe_0:stdout]: Name Location Size Created"
rancher-system-agent[13135]: time="2024-04-30T14:27:40+02:00" level=info msg="[Applyinator] Command sh [-c rke2 etcd-snapshot list --etcd-s3=false 2>/dev/null] finished with err: <nil> and exit code: 0"
rancher-system-agent[13135]: time="2024-04-30T14:27:40+02:00" level=info msg="[K8s] updated plan secret fleet-default/custom-8b637c9cd841-machine-plan with feedback"
rancher-system-agent[13135]: time="2024-04-30T14:27:45+02:00" level=error msg="[K8s] received secret to process that was older than the last secret operated on. (14820 vs 18024)"
rancher-system-agent[13135]: time="2024-04-30T14:27:45+02:00" level=error msg="error syncing 'fleet-default/custom-8b637c9cd841-machine-plan': handler secret-watch: secret received was too old, requeuing"
rancher-system-agent[13135]: W0430 14:29:05.831181   13135 reflector.go:456] pkg/mod/github.com/rancher/[email protected]/tools/cache/reflector.go:231: watch of *v1.Secret ended with: an error on the server ("unable to decode an event from the watch stream: stream error: stream ID 35; INTERNAL_ERROR; received from peer") has prevented the request from succeeding

难道是secret问题导致的错误?

rancher podman rancher-rke
1个回答
0
投票

在机器(下游集群应运行的机器)上设置注册命令后,您可以开始调试 rke2-agent 和 rke2-server。要进行调试,您可以检查 RKE(或 K3s)的状态。例如:

  • 检查rke2-server服务状态:
    systemctl status rke2-server.service
  • 检查 kubelet 日志。对于 Longhorn,您可以在这里找到它:
    /var/lib/rancher/rke2/agent/logs

有关更详细的指导,请参阅官方 RKE2 或 K3s 文档以查明问题。

© www.soinside.com 2019 - 2024. All rights reserved.