在 kubernetes 上为 Prometheus 配置节点导出器 DaemonSet 抓取时出错

问题描述 投票:0回答:1

我已经在 kubernetes 上设置了一个节点导出器 DaemonSet 以及一个指向这些节点导出器 Pod IP 的服务(我遵循了 this 教程)。 当我运行

kubectl get endpoints -n monitoring
时,我验证该服务是否正确指向创建的 3 个 DaemonSet Pod。

之后,在

prometheus.yml
文件中,我添加了此配置来抓取节点导出器指标:

  - job_name: "node-exporter"
    kubernetes_sd_configs:
      - role: endpoints
        namespaces:
          names:
          - monitoring
    relabel_configs:
      - source_labels: [__meta_kubernetes_endpoints_name]
        regex: "node-exporter"
        action: keep

问题是当我应用这些配置并重新启动 prometheus.service 时:

> systemctl status prometheys.service --no-pager --full

● prometheus.service - PromServer
     Loaded: loaded (/etc/systemd/system/prometheus.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2024-06-18 16:24:18 UTC; 1min 41s ago
   Main PID: 1441565 (prometheus)
      Tasks: 10 (limit: 33613)
     Memory: 38.2M
        CPU: 325ms
     CGroup: /system.slice/prometheus.service
             └─1441565 /usr/local/bin/prometheus --web.enable-admin-api --config.file /etc/prometheus/prometheus.yml --storage.tsdb.path /var/lib/prometheus/ --web.console.templates=/etc/prometheus/consoles --web.console.libraries=/etc/prometheus/console_libraries

Jun 18 16:24:18 vm-devops-tiim-giro-prd prometheus[1441565]: ts=2024-06-18T16:24:18.654Z caller=head.go:755 level=info component=tsdb msg="WAL segment loaded" segment=158 maxSegment=159
Jun 18 16:24:18 vm-devops-tiim-giro-prd prometheus[1441565]: ts=2024-06-18T16:24:18.655Z caller=head.go:755 level=info component=tsdb msg="WAL segment loaded" segment=159 maxSegment=159
Jun 18 16:24:18 vm-devops-tiim-giro-prd prometheus[1441565]: ts=2024-06-18T16:24:18.655Z caller=head.go:792 level=info component=tsdb msg="WAL replay completed" checkpoint_replay_duration=7.37407ms wal_replay_duration=43.4001ms wbl_replay_duration=200ns total_replay_duration=51.456586ms
Jun 18 16:24:18 vm-devops-tiim-giro-prd prometheus[1441565]: ts=2024-06-18T16:24:18.657Z caller=main.go:1040 level=info fs_type=EXT4_SUPER_MAGIC
Jun 18 16:24:18 vm-devops-tiim-giro-prd prometheus[1441565]: ts=2024-06-18T16:24:18.657Z caller=main.go:1043 level=info msg="TSDB started"
Jun 18 16:24:18 vm-devops-tiim-giro-prd prometheus[1441565]: ts=2024-06-18T16:24:18.657Z caller=main.go:1224 level=info msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
Jun 18 16:24:18 vm-devops-tiim-giro-prd prometheus[1441565]: ts=2024-06-18T16:24:18.661Z caller=manager.go:317 level=error component="discovery manager scrape" msg="Cannot create service discovery" err="unable to load in-cluster configuration, KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT must be defined" type=kubernetes config=node-exporter
Jun 18 16:24:18 vm-devops-tiim-giro-prd prometheus[1441565]: ts=2024-06-18T16:24:18.664Z caller=main.go:1261 level=info msg="Completed loading of configuration file" filename=/etc/prometheus/prometheus.yml totalDuration=6.861558ms db_storage=1.6µs remote_storage=1.2µs web_handler=500ns query_engine=800ns scrape=3.46318ms scrape_sd=100.802µs notify=42.801µs notify_sd=13.7µs rules=2.858566ms tracing=8.1µs
Jun 18 16:24:18 vm-devops-tiim-giro-prd prometheus[1441565]: ts=2024-06-18T16:24:18.664Z caller=main.go:1004 level=info msg="Server is ready to receive web requests."
Jun 18 16:24:18 vm-devops-tiim-giro-prd prometheus[1441565]: ts=2024-06-18T16:24:18.664Z caller=manager.go:995 level=info component="rule manager" msg="Starting rule manager..."

从输出中,我得到这个错误:


Jun 18 16:24:18 vm-devops-tiim-giro-prd prometheus[1441565]: ts=2024-06-18T16:24:18.661Z caller=manager.go:317 level=error component="discovery manager scrape" msg="Cannot create service discovery" err="unable to load in-cluster configuration, KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT must be defined" type=kubernetes config=node-exporter

到目前为止,我的谷歌搜索还没有任何运气...任何人都可以帮助我指导我在配置中缺少哪些变量吗?

kubernetes prometheus service-discovery prometheus-node-exporter daemonset
1个回答
0
投票

该错误表明 prometheus 无法找到 prometheus 与 Kubernetes API 服务器交互所需的配置(KUBERNETES_SERVICE_HOST 和 KUBERNETES_SERVICE_PORT)。

您可以按照以下故障排除步骤来解决问题:

  • 确保prometheus部署包含上面提到的环境变量。如果变量丢失,请使用以下配置添加它们:

    • env:

      - name: "KUBERNETES\_SERVICE\_HOST" 

        value: kubernetes.default.svc

      - name: KUBERNETES\_SERVICE\_PORT

        value: "443"

  • 确保 prometheus 使用的服务帐户具有访问服务和端点所需的适当的RBAC权限

  • 如果您有普罗米修斯日志,最好监视它们是否有任何与服务发现相关的错误。

  • 交叉检查您是否在

    prometheus.yml 
    文件中使用正确的命名空间和正确的端点。

注:

如果您尝试将指标导出到与 kubernetes 不在同一集群上的外部 prometheus,请按照此链接

© www.soinside.com 2019 - 2024. All rights reserved.