我有一个最多75个节点的kops集群,并添加了cluster autoscaler。它使用kubenet网络。事情目前已停止工作 - 即不再发生缩小规模。
群集在最大容量上运行,即使几乎没有负载也可以运行75个节点。不知道从哪里开始解决问题。
请参阅群集自动缩放器窗格中的以下错误
I0222 01:45:14.327164 1 static_autoscaler.go:97] Starting main loop
W0222 01:45:14.770818 1 static_autoscaler.go:150] Cluster is not ready for autoscaling
I0222 01:45:15.043126 1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
I0222 01:45:17.121507 1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
I0222 01:45:19.126665 1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
I0222 01:45:21.327581 1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
I0222 01:45:23.331802 1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
I0222 01:45:24.775124 1 static_autoscaler.go:97] Starting main loop
W0222 01:45:25.085442 1 static_autoscaler.go:150] Cluster is not ready for autoscaling
自动缩放工作正常。
更新,运行kops validate cluster
时也会看到以下错误
VALIDATION ERRORS
KIND NAME MESSAGE
Node ip-172-20-32-173.ec2.internal node "ip-172-20-32-173.ec2.internal" is not ready
...
I0221 22:16:02.688911 2403 node_conditions.go:60] node "ip-172-20-51-238.ec2.internal" not ready: &NodeCondition{Type:NetworkUnavailable,Status:True,LastHeartbeatTime:2019-02-21 22:15:56 -0500 EST,LastTransitionTime:2019-02-21 22:15:56 -0500 EST,Reason:NoRouteCreated,Message:RouteController failed to create a route,}
我发现问题是由于AWS VPC路由表中的this limitation,我的群集已进入不健康状态。我的群集已缩放到75个节点,然后变得不健康并且无法缩小。
从链接,
使用kubenet网络时的一个重要限制是AWS路由表不能超过50个条目,这为每个群集设置了50个节点的限制。