使用 Ansible 和 Helm 在 AKS 集群上部署 Loki 导致“等待条件超时”错误

问题描述 投票:0回答:0

我正在尝试使用 Ansible 和 Helm 在我的 AKS 集群上部署 Loki。我集群的版本目前是1.24.10。我正在使用以下配置:

Loki图表版本:5.0.0 球童版本:2.8.0 当我使用 Grafana Helm 图表 (https://grafana.github.io/helm-charts) 部署 Loki 时,我的管道中出现以下错误:

Failure when executing Helm command. Exited 1.
stdout: Release "loki" does not exist. Installing it now.
stderr: Error: timed out waiting for the condition
stderr: |-
Error: timed out waiting for the condition

检查我的 Loki pod(loki-writeloki-read)在CrashLoopBackOff 中的日志后,我发现了以下错误:

level=error ts=2023-04-19T15:46:46.224948881Z caller=log.go:171 msg="error running loki" err="mkdir /data: read-only file system\nerror creating index client\ngithub.com/grafana/loki/pkg/storage.(*store).storeForPeriod\n\t/src/loki/pkg/storage/store.go:270\ngithub.com/grafana/loki/pkg/storage.(*store).init\n\t/src/loki/pkg/storage/store.go:164\ngithub.com/grafana/loki/pkg/storage.NewStore\n\t/src/loki/pkg/storage/store.go:147\ngithub.com/grafana/loki/pkg/loki.(*Loki).initStore\n\t/src/loki/pkg/loki/modules.go:655\ngithub.com/grafana/dskit/modules.(*Manager).initModule\n\t/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:120\ngithub.com/grafana/dskit/modules.(*Manager).InitModuleServices\n\t/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:92\ngithub.com/grafana/loki/pkg/loki.(*Loki).Run\n\t/src/loki/pkg/loki/loki.go:457\nmain.main\n\t/src/loki/cmd/loki/main.go:110\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598\nerror initialising module: store\ngithub.com/grafana/dskit/modules.(*Manager).initModule\n\t/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:122\ngithub.com/grafana/dskit/modules.(*Manager).InitModuleServices\n\t/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:92\ngithub.com/grafana/loki/pkg/loki.(*Loki).Run\n\t/src/loki/pkg/loki/loki.go:457\nmain.main\n\t/src/loki/cmd/loki/main.go:110\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598"

这是我在任务下

main.yml
文件中的部署部分:

    - name: "Deploy loki chart version {{loki_chart_version}} on cluster: {{cluster_name}}"
      kubernetes.core.helm:
        atomic: no
        name: loki
        chart_ref: grafana/loki
        chart_version: "{{loki_chart_version}}"
        release_namespace: "{{loki_namespace}}"
        create_namespace: yes
        kubeconfig: "{{kube_config}}"
        update_repo_cache: no
        wait: yes
        wait_timeout: 10m
        force: no
        state: present
        values:
    
          terminationGracePeriodSeconds: 300
          service:
            port: "{{ loki_port }}"
    
          loki:
            server:
              http_listen_port: "{{ loki_port }}"
            schema_config:
              configs:
                - from: 2021-09-01
                  store: boltdb-shipper
                  object_store: azure
                  schema: v11
                  index:
                    prefix: index_
                    period: 24h
            storage_config:
              boltdb_shipper:
                shared_store: azure
                active_index_directory: /data/loki/boltdb-shipper-active
                cache_location: /data/loki/boltdb-shipper-cache
                cache_ttl: 24h
              azure:
                container_name: "{{ az_container_name }}"
                account_name: "{{ az_account_name }}"
                account_key: "{{ az_access_key }}"
                request_timeout: 5m
              filesystem:
                directory: /data/loki/chunks
            
            chunk_store_config:
              max_look_back_period: 4464h
            table_manager:
              retention_deletes_enabled: true
              retention_period: 4464h

此外,我没有使用任何 PVC,而是使用 Blob 存储容器来发送我的 Loki 日志。 PVC 不是我的选择。

有人可以帮助我了解导致此错误的原因以及如何解决它吗?

ansible kubernetes-helm azure-aks grafana-loki
© www.soinside.com 2019 - 2024. All rights reserved.