Kubernetes上的bokeh服务定期重新启动正常吗?

问题描述 投票:0回答:1

我在docker容器中提供了一个bokeh仪表板,该容器在kubernetes上运行。我可以远程访问仪表板,没有问题。但是我注意到我的包含bokeh服务代码的广告连播重新启动了很多次,即在过去2小时内重新启动了14次。有时状态会返回为“ CrashLoopBackOff”,而有时状态通常为“正在运行”。

我的问题是,有关bokeh服务的工作方式是否需要kubernetes如此频繁地重新启动?与内存有关(OOMKilled)吗?

这是我的个人描述窗格的一部分:

Name:               bokeh-744d4bc9d-5pkzq
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               10.183.226.51/10.183.226.51
Start Time:         Tue, 18 Feb 2020 11:55:44 +0000
Labels:             name=bokeh
                    pod-template-hash=744d4bc9d
Annotations:        kubernetes.io/psp: xyz-privileged-psp
Status:             Running
IP:                 172.30.255.130
Controlled By:      ReplicaSet/bokeh-744d4bc9d
Containers:
  dashboard-application:
    Container ID:   containerd://16d10dc5dd89235b0xyz2b5b31f8e313f3f0bb7efe82a12e00c1f01708e2f894
    Image:          us.icr.io/oss-data-science-np-dal/bokeh:118
    Image ID:       us.icr.io/oss-data-science-np-dal/bokeh@sha256:037a5b52a6e7c792fdxy80b01e29772dbfc33b10e819774462bee650cf0da
    Port:           5006/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Tue, 18 Feb 2020 14:25:36 +0000
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Tue, 18 Feb 2020 14:15:26 +0000
      Finished:     Tue, 18 Feb 2020 14:23:54 +0000
    Ready:          True
    Restart Count:  17
    Limits:
      cpu:     800m
      memory:  600Mi
    Requests:
      cpu:        600m
      memory:     400Mi
    Liveness:     http-get http://:5006/ delay=10s timeout=1s period=10s #success=1 #failure=3
    Readiness:    http-get http://:5006/ delay=10s timeout=1s period=3s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-cjhfk (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  default-token-cjhfk:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-cjhfk
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 600s
                 node.kubernetes.io/unreachable:NoExecute for 600s
Events:
  Type     Reason     Age                    From                    Message
  ----     ------     ----                   ----                    -------
  Warning  Unhealthy  36m (x219 over 150m)   kubelet, 10.183.226.51  Liveness probe failed: Get http://172.30.255.130:5006/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Warning  BackOff    21m (x34 over 134m)    kubelet, 10.183.226.51  Back-off restarting failed container
  Warning  Unhealthy  10m (x72 over 150m)    kubelet, 10.183.226.51  Readiness probe failed: Get http://172.30.255.130:5006/RCA: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy  6m4s (x957 over 150m)  kubelet, 10.183.226.51  Readiness probe failed: Get http://172.30.255.130:5006/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy  68s (x23 over 147m)    kubelet, 10.183.226.51  Liveness probe failed: Get http://172.30.255.130:5006/RCA: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

我是k8的新手,因此,您在此类问题上需要保留的任何信息将不胜感激!

kubernetes bokeh
1个回答
1
投票

如果容器分配的内存超过其限制,则该容器将成为终止的候选者。如果容器继续消耗超出其限制的内存,则终止容器。如果终止的容器可以重新启动,则kubelet会重新启动它,就像其他任何类型的运行时失败一样。已记录在here中。


0
投票

OOMKill表示您的Pod占用过多的RAM并被杀死,以避免破坏节点上运行的其他工作负载。

如果可行,您可以编辑代码以使用较少的RAM,或者增加limits.memory

您通常希望有个请求=限制,除非您的吊舱在开始时运行了一些杂物,然后什么也不做。

您可能想拿一个look at the official documentation

© www.soinside.com 2019 - 2024. All rights reserved.