K8S groundnuty/k8s-wait-for 映像无法作为 init 容器启动（带 helm）

Question

我面临图像问题groundnuty/k8s-wait-for。项目位于 github ，存储库位于 dockerhub。

我很确定命令参数中存在错误，因为 init 容器失败并显示 Init:CrashLoopBackOff。

关于图片： 该镜像用于初始化容器，需要推迟 Pod 部署。映像中的脚本等待 pod 或作业完成，完成后它会让主容器和所有副本开始部署。

在我的示例中，它应该等待名为

{{ .Release.Name }}-os-server-migration-{{ .Release.Revision }}

的作业完成，并且在检测到作业完成后，它应该让主容器启动。使用 Helm 模板。

根据我的理解，作业名称是

{{ .Release.Name }}-os-server-migration-{{ .Release.Revision }}

，deployment.yml 中 init 容器的第二个命令参数需要相同，以便 init 容器可以依赖于指定的作业。对于这种方法还有其他意见或经验吗？

附有模板。

部署.YML：

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ .Release.Name }}-os-{{ .Release.Revision }}
  namespace: {{ .Values.namespace }}
  labels:
    app: {{ .Values.fullname }}
spec:
  replicas: {{ .Values.replicaCount }}
  selector:
    matchLabels:
      app: {{ .Values.fullname }}
  template:
    metadata:
      labels:
        app: {{ .Values.fullname }}
    spec:
      {{- with .Values.imagePullSecrets }}
      imagePullSecrets:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      containers:
        - name: {{ .Chart.Name }}
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          ports:
            - name: http
              containerPort: 8080
          resources:
            {{- toYaml .Values.resources | nindent 12 }}
      initContainers:
        - name: "{{ .Chart.Name }}-init"
          image: "groundnuty/k8s-wait-for:v1.3"
          imagePullPolicy: "{{ .Values.init.pullPolicy }}"
          args:
            - "job"
            - "{{ .Release.Name }}-os-server-migration-{{ .Release.Revision }}"

工作.YML：

apiVersion: batch/v1
kind: Job
metadata:
  name: {{ .Release.Name }}-os-server-migration-{{ .Release.Revision }}
  namespace: {{ .Values.migration.namespace }}
spec:
  backoffLimit: {{ .Values.migration.backoffLimit }}
  template:
    spec:
      {{- with .Values.migration.imagePullSecrets }}
      imagePullSecrets:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      containers:
        - name: {{ .Values.migration.fullname }}
          image: "{{ .Values.migration.image.repository }}:{{ .Values.migration.image.tag }}"
          imagePullPolicy: {{ .Values.migration.image.pullPolicy }}
          command:
            - sh
            - /app/migration-entrypoint.sh
      restartPolicy: {{ .Values.migration.restartPolicy }}

日志：

  Normal   Scheduled  46s                default-scheduler  Successfully assigned development/octopus-dev-release-os-1-68cb9549c8-7jggh to minikube
  Normal   Pulled     41s                kubelet            Successfully pulled image "groundnuty/k8s-wait-for:v1.3" in 4.277517553s
  Normal   Pulled     36s                kubelet            Successfully pulled image "groundnuty/k8s-wait-for:v1.3" in 3.083126925s
  Normal   Pulling    20s (x3 over 45s)  kubelet            Pulling image "groundnuty/k8s-wait-for:v1.3"
  Normal   Created    18s (x3 over 41s)  kubelet            Created container os-init
  Normal   Started    18s (x3 over 40s)  kubelet            Started container os-init
  Normal   Pulled     18s                kubelet            Successfully pulled image "groundnuty/k8s-wait-for:v1.3" in 1.827195139s
  Warning  BackOff    4s (x4 over 33s)   kubelet            Back-off restarting failed container

kubectl 获得全能开发

NAME                                                        READY   STATUS                  RESTARTS   AGE
pod/octopus-dev-release-os-1-68cb9549c8-7jggh   0/1     Init:CrashLoopBackOff   2          44s
pod/octopus-dev-release-os-1-68cb9549c8-9qbdv   0/1     Init:CrashLoopBackOff   2          44s
pod/octopus-dev-release-os-1-68cb9549c8-c8h5k   0/1     Init:Error              2          44s
pod/octopus-dev-release-os-migration-1-9wq76    0/1     Completed               0          44s
......
......
NAME                                                       COMPLETIONS   DURATION   AGE
job.batch/octopus-dev-release-os-migration-1   1/1           26s        44s

Answer 1

对于面临同样问题的任何人，我将解释我的解决方案。

问题是deployment.yaml中的容器没有使用Kube API的权限。因此，groundnuty/k8s-wait-for:v1.3 容器无法检查作业 {{ .Release.Name }}-os-server-migration-{{ .Release.Revision }} 是否已完成。这就是初始化容器立即失败并出现 CrashLoopError 的原因。

添加服务帐户、角色和角色绑定后，一切都运行良好，并且 groundnuty/k8s-wait-for:v1.3 成功等待作业（迁移）完成，以便让主容器运行。

以下是解决问题的服务帐户、角色和角色绑定的代码示例。

sa.yaml

apiVersion: v1
kind: ServiceAccount
metadata:
  name: sa-migration
  namespace: development

角色.yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: migration-reader
rules:
  - apiGroups: ["batch","extensions"]
    resources: ["jobs"]
    verbs: ["get","watch","list"]

角色绑定.yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: migration-reader
subjects:
- kind: ServiceAccount
  name: sa-migration
roleRef:
  kind: Role
  name: migration-reader
  apiGroup: rbac.authorization.k8s.io

Answer 2

我遇到了这个问题。事实证明，问题在于 pod 缺乏执行

kubectl get

查询的权限。参考.

我添加这个答案是因为它在没有任何

.yaml

文件的情况下解决了问题。

故障排除步骤

步骤1

进入 Pod -> 单击日志 -> 选择 init 容器来检查 init 容器日志。

或者您可以使用

kubectl logs <pod-name> -c <container-name> -n <namespace>

来获取容器日志。本例中的日志如下所示：

Error from server (Forbidden): jobs.batch "test-app-release-test-app-cli-1" is forbidden: User "system:serviceaccount:local:default" cannot get resource "jobs" in API group "batch" in the namespace "local"

这意味着 pod 缺乏执行

kubectl get

查询的权限。参考.

解决此问题的方法是创建一个有权读取作业的角色，并将该角色绑定到服务帐户。服务帐户就可以在日志中看到：

local:default

。服务帐户的格式为

<namespace>:<serviceaccount>

.

步骤2

创建角色

kubectl create role job-reader --verb=get --verb=list --verb=watch --resource=jobs --namespace=local

步骤3

创建角色绑定

# This role binding allows "local:default" service account to read jobs in the "local" namespace.
# You need to already have a role named "job-reader" in that namespace.
kubectl create rolebinding read-jobs --role=job-reader --serviceaccount=local:default --namespace=local

这解决了问题！

完整源代码

如果您愿意，请在

此处查看完整的源代码。 deployment.yaml

文件具有 init 容器部分。

K8S groundnuty/k8s-wait-for 映像无法作为 init 容器启动（带 helm）

问题描述投票：0回答：2

2个回答

故障排除步骤

步骤1

步骤2

步骤3

完整源代码

最新问题

K8S groundnuty/k8s-wait-for 映像无法作为 init 容器启动（带 helm）

问题描述 投票：0回答：2

2个回答

故障排除步骤

步骤1

步骤2

步骤3

完整源代码

最新问题

问题描述投票：0回答：2