Cloud2Edge 包的 Eclipse Hono pod 中的准备失败

问题描述 投票:0回答:1

我有点绝望,希望有人能帮助我。几个月前,我按照安装说明在 kubernetes 集群上安装了 eclipse cloud2edge 软件包,创建了一个 permanentVolume 并使用这些选项运行 helm install 命令。

helm install -n $NS --wait --timeout 15m $RELEASE eclipse-iot/cloud2edge --set hono.prometheus.createInstance=false --set hono.grafana.enabled=false --dependency-update --debug

perpetualVolume 的 yaml 如下,我在安装包的同一命名空间中创建它。

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-device-registry
spec:
  accessModes: 
    - ReadWriteOnce
capacity:
  storage: 1Mi
hostPath:
  path: /mnt/
  type: Directory

一切都很完美,所有 Pod 都已准备就绪并正在运行,直到有一天集群崩溃并且一些 Pod 停止工作。

kubectl get pods -n $NS输出如下:

NAME                                          READY   STATUS    RESTARTS   AGE
ditto-mongodb-7b78b468fb-8kshj                1/1     Running   0          50m
dt-adapter-amqp-vertx-6699ccf495-fc8nx        0/1     Running   0          50m
dt-adapter-http-vertx-545564ff9f-gx5fp        0/1     Running   0          50m
dt-adapter-mqtt-vertx-58c8975678-k5n49        0/1     Running   0          50m
dt-artemis-6759fb6cb8-5rq8p                   1/1     Running   1          50m
dt-dispatch-router-5bc7586f76-57dwb           1/1     Running   0          50m
dt-ditto-concierge-f6d5f6f9c-pfmcw            1/1     Running   0          50m
dt-ditto-connectivity-f556db698-q89bw         1/1     Running   0          50m
dt-ditto-gateway-589d8f5596-59c5b             1/1     Running   0          50m
dt-ditto-nginx-897b5bc76-cx2dr                1/1     Running   0          50m
dt-ditto-policies-75cb5c6557-j5zdg            1/1     Running   0          50m
dt-ditto-swaggerui-6f6f989ccd-jkhsk           1/1     Running   0          50m
dt-ditto-things-79ff869bc9-l9lct              1/1     Running   0          50m
dt-ditto-thingssearch-58c5578bb9-pwd9k        1/1     Running   0          50m
dt-service-auth-698d4cdfff-ch5wp              1/1     Running   0          50m
dt-service-command-router-59d6556b5f-4nfcj    0/1     Running   0          50m
dt-service-device-registry-7cf75d794f-pk9ct   0/1     Running   0          50m

失败的 pod 在运行时都会出现相同的错误 kubectl描述pod POD_NAME -n $NS

Events:
Type     Reason     Age                    From               Message
----     ------     ----                   ----               -------
Normal   Scheduled  53m                    default-scheduler  Successfully assigned digitaltwins/dt-service-command-router-59d6556b5f-4nfcj to node1
Normal   Pulled     53m                    kubelet            Container image "index.docker.io/eclipse/hono-service-command-router:1.8.0" already present on machine
Normal   Created    53m                    kubelet            Created container service-command-router
Normal   Started    53m                    kubelet            Started container service-command-router
Warning  Unhealthy  52m                    kubelet            Readiness probe failed: Get "https://10.244.1.89:8088/readiness": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Warning  Unhealthy  2m58s (x295 over 51m)  kubelet            Readiness probe failed: HTTP probe failed with statuscode: 503

据此,readinessProbe失败。在受影响部署的 yalm 定义中,定义了 readinessProbe:

readinessProbe:
  failureThreshold: 3
  httpGet:
     path: /readiness
     port: health
     scheme: HTTPS
  initialDelaySeconds: 45
  periodSeconds: 10
  successThreshold: 1
  timeoutSeconds: 1

我尝试增加这些值,将延迟增加到 600,将超时增加到 10。我还尝试卸载软件包并再次安装,但没有任何变化:安装失败,因为 Pod 从未准备好并且弹出超时。我还暴露了端口 8088(健康)并使用 wget 调用 /readiness,结果仍然是 503。另一方面,我测试了 livenessProbe 是否工作并且工作正常。我也尝试过重置集群。首先我手动删除了其中的所有内容,然后使用以下命令:

sudo kubeadm reset
sudo iptables -F && sudo iptables -t nat -F && sudo iptables -t mangle -F && sudo iptables -X
sudo systemctl stop kubelet
sudo systemctl stop docker
sudo rm -rf /var/lib/cni/
sudo rm -rf /var/lib/kubelet/*
sudo rm -rf /etc/cni/
sudo ifconfig cni0 down
sudo ifconfig flannel.1 down
sudo ifconfig docker0 down
sudo ip link set cni0 down
sudo brctl delbr cni0  
sudo systemctl start docker
sudo kubeadm init --apiserver-advertise-address=192.168.44.11 --pod-network-cidr=10.244.0.0/16
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl --kubeconfig $HOME/.kube/config apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

集群似乎工作正常,因为 Eclipse Ditto 部分没有问题,只是 Eclipse Hono 部分有问题。我添加了更多信息,以防有用。

kubectl 记录 dt-service-command-router-b654c8dcb-s2g6t -n $NS 输出:

12:30:06.340 [vert.x-eventloop-thread-1] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin /10.244.1.101:44142 failed to connect over ssl: javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown 12:30:06.756 [vert.x-eventloop-thread-1] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin /10.244.1.100:46550 failed to connect over ssl: javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown 12:30:07.876 [vert.x-eventloop-thread-1] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin /10.244.1.102:40706 failed to connect over ssl: javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown 12:30:08.315 [vert.x-eventloop-thread-1] DEBUG o.e.h.client.impl.HonoConnectionImpl - starting attempt [#258] to connect to server [dt-service-device-registry:5671, role: Device Registration] 12:30:08.315 [vert.x-eventloop-thread-1] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - OpenSSL [available: false, supports KeyManagerFactory: false] 12:30:08.315 [vert.x-eventloop-thread-1] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - using JDK's default SSL engine 12:30:08.315 [vert.x-eventloop-thread-1] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - enabling secure protocol [TLSv1.3] 12:30:08.315 [vert.x-eventloop-thread-1] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - enabling secure protocol [TLSv1.2] 12:30:08.315 [vert.x-eventloop-thread-1] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - connecting to AMQP 1.0 container [amqps://dt-service-device-registry:5671, role: Device Registration] 12:30:08.339 [vert.x-eventloop-thread-1] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - can't connect to AMQP 1.0 container [amqps://dt-service-device-registry:5671, role: Device Registration]: Failed to create SSL connection 12:30:08.339 [vert.x-eventloop-thread-1] WARN o.e.h.client.impl.HonoConnectionImpl - attempt [#258] to connect to server [dt-service-device-registry:5671, role: Device Registration] failed javax.net.ssl.SSLHandshakeException: Failed to create SSL connection
kubectl 记录 dt-adapter-amqp-vertx-74d69cbc44-7kmdq -n $NS

输出: 12:19:36.686 [vert.x-eventloop-thread-0] DEBUG o.e.h.client.impl.HonoConnectionImpl - starting attempt [#19] to connect to server [dt-service-device-registry:5671, role: Credentials] 12:19:36.686 [vert.x-eventloop-thread-0] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - OpenSSL [available: false, supports KeyManagerFactory: false] 12:19:36.686 [vert.x-eventloop-thread-0] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - using JDK's default SSL engine 12:19:36.686 [vert.x-eventloop-thread-0] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - enabling secure protocol [TLSv1.3] 12:19:36.686 [vert.x-eventloop-thread-0] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - enabling secure protocol [TLSv1.2] 12:19:36.686 [vert.x-eventloop-thread-0] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - connecting to AMQP 1.0 container [amqps://dt-service-device-registry:5671, role: Credentials] 12:19:36.711 [vert.x-eventloop-thread-0] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - can't connect to AMQP 1.0 container [amqps://dt-service-device-registry:5671, role: Credentials]: Failed to create SSL connection 12:19:36.712 [vert.x-eventloop-thread-0] WARN o.e.h.client.impl.HonoConnectionImpl - attempt [#19] to connect to server [dt-service-device-registry:5671, role: Credentials] failed javax.net.ssl.SSLHandshakeException: Failed to create SSL connection

kubectl版本

输出如下: Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T12:50:19Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.16", GitCommit:"e37e4ab4cc8dcda84f1344dda47a97bb1927d074", GitTreeState:"clean", BuildDate:"2021-10-27T16:20:18Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}

提前致谢!

kubernetes readinessprobe eclipse-hono eclipse-ditto
1个回答
1
投票
无法创建 SSL 连接

输出,我假设您已经遇到了可怕的Hono 图表中包含的演示证书已过期问题。 Cloud2Edge 包图表当前正在更新 (

https://github.com/eclipse/packages/pull/337

),使用最新版本的 Ditto 和 Hono 图表(其中包括对另外两个有效的新证书)未来几年)。一旦 PR 合并并且 Eclipse Packages 图表存储库已重建,您应该能够执行 helm repo update,然后(希望)成功安装 c2e 包。

    

© www.soinside.com 2019 - 2024. All rights reserved.