所以我已经在一堆树莓派 4 和 3 上运行 k3s 一段时间了,但一个节点总是失败。这是我的设置
服务器:Raspberry pi 4-8GB 和 2x Raspberry pi 4-4GB 工人:3 个 Raspberry pi 3Bs
我的第二个树莓派 4 - 4GB 一直出现故障。所有 6 个节点均连接到 GB 以太网和三星 500 GB SSD
我正在运行一个 HA 环境,外部 sql 数据库连接到我的 NAS (192.168.1.200),如下所示。
以下是我遇到的一些错误
当我运行
Sudo systemctl status k3s
时,我得到(出于隐私原因更改了用户/密码):
k3s.service - Lightweight Kubernetes
Loaded: loaded (/etc/systemd/system/k3s.service; enabled; vendor preset: enabled)
Active: activating (auto-restart) (Result: exit-code) since Sun 2021-07-04 01:59:29 BST; 609ms ago
Docs: https://k3s.io
Process: 14637 ExecStartPre=/bin/sh -xc ! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service (code=exited, status=0/SUCCESS)
Process: 14639 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
Process: 14640 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
Process: 14641 ExecStart=/usr/local/bin/k3s server --tls-san 192.168.1.100 --datastore-endpoint mysql://user:pass@tcp(192.168.1.200:3306)/k3s --disable traefik (code=exited, status=1/FAILURE)
Main PID: 14641 (code=exited, status=1/FAILURE)
当我跑步时
journalctl -xe
我得到:
-- Subject: A start job for unit k3s.service has begun execution
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit k3s.service has begun execution.
--
-- The job identifier is 325190.
Jul 04 02:13:46 node50 sh[18386]: + /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service
Jul 04 02:13:46 node50 sh[18386]: Failed to get unit file state for nm-cloud-setup.service: No such file or directory
Jul 04 02:13:47 node50 k3s[18390]: time="2021-07-04T02:13:47.365627432+01:00" level=info msg="Starting k3s v1.19.12+k3s1 (559d0c47)"
Jul 04 02:13:47 node50 k3s[18390]: time="2021-07-04T02:13:47.366403382+01:00" level=info msg="Cluster bootstrap already complete"
Jul 04 02:13:47 node50 k3s[18390]: time="2021-07-04T02:13:47.418216205+01:00" level=fatal msg="starting kubernetes: preparing server: creating storage endpoint: building kine: Error 1129: Host is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'"
Jul 04 02:13:47 node50 systemd[1]: k3s.service: Main process exited, code=exited, status=1/FAILURE
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- An ExecStart= process belonging to unit k3s.service has exited.
--
-- The process' exit code is 'exited' and its exit status is 1.
Jul 04 02:13:47 node50 systemd[1]: k3s.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- The unit k3s.service has entered the 'failed' state with result 'exit-code'.
Jul 04 02:13:47 node50 systemd[1]: Failed to start Lightweight Kubernetes.
-- Subject: A start job for unit k3s.service has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit k3s.service has finished with a failure.
根据此错误,我已登录 MySQL 实例并运行
mysqladmin flush-hosts
,它修复了问题几个小时,然后又发生了。所以我有点不明白为什么这个问题只发生在一个 Pi 上。否则 Pi 工作正常。可以毫无问题地运行 docker 和其他程序。
我还将 my.cnf 中的最大连接数从 100 增加到 10000
有人有什么想法吗?
https://get.k3s.io | INSTALL_K3S_CHANNEL=v1.27.7+k3s2 sh - 适用于版本 v1.27.7+k3s2