Helm / kube-prometheus-stack:我可以在 value.yaml 中为导出器创建规则吗?

问题描述 投票:0回答:4

我希望能够指定我的所有规则,例如

prometheus-blackbox-exporter
,因此已将其添加到
rules-mine.yaml
并使用

进行部署
helm upgrade --install -n monitoring blackbox -f values.yaml -f rules-mine.yaml .

我看不到 http://localhost:9090/rules 中列出的任何规则,并且似乎没有任何内容被评估为没有警报......我需要以 IaC 的方式完成所有操作,并以自动化的方式通过 terraform 进行部署。

  • 是否可以通过这种方式给出口商添加规则?
  • 如果是这样,那么任何人都可以看到下面的文件有问题吗?
  • 如果没有,如何高效地为众多出口商添加规则?

rules-mine.yaml
文件包含:

prometheusRule:
  enabled:  true
  namespace: monitoring
  additionalLabels:
    team: foxtrot_blackbox
    environment: production
    cluster: cluster
    namespace: namespace_x
  namespace: "monitoring"

  rules:
  - alert: BlackboxProbeFailed
    expr: probe_success == 0
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: Blackbox probe failed (instance {{`{{`}} $labels.instance {{`}}`}})
      description: "Probe failed\n  VALUE = {{`{{`}} $value {{`}}`}}"

  - alert: BlackboxSlowProbe
    expr: avg_over_time(probe_duration_seconds[1m]) > 1
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: Blackbox slow probe (instance {{`{{`}} $labels.instance {{`}}`}})
      description: "Blackbox probe took more than 1s to complete\n  VALUE = {{`{{`}} $value {{`}}`}}"

感谢您的帮助......

prometheus kubernetes-helm prometheus-alertmanager prometheus-operator kube-prometheus-stack
4个回答
3
投票

我发现的最好方法似乎是将导出器规则添加到

kube-prometheus-stack
values.yaml
文件(我实际上创建了一个单独的
rules.yaml
文件)并将其提供给 helm:

  • helm upgrade --install -n monitoring prometheus --create-namespace -f values-mine.yaml -f rules-mine.yaml prometheus-community/kube-prometheus-stack

然后按照我想要的方式选择所有规则,这似乎是一个不错的解决方案。但我仍然希望它们与出口商分组 - 如果我找到解决方案,我会再次发布。

additionalPrometheusRulesMap:
  prometheus.rules:
    groups:
    - name: company.prometheus.rules
      rules:
      - alert: PrometheusNotificationsBacklog
        expr: min_over_time(prometheus_notifications_queue_length[10m]) > 0
        for: 0m
        labels:
          severity: warning
        annotations:
          summary: Prometheus notifications backlog (instance {{ $labels.instance }})
          description: The Prometheus notification queue has not been empty for 10 minutes\nVALUE = {{ $value }}
          dashboard_url: ${grafana_url}/d/blackbox/blackbox-exporter?var-instance={{ $labels.instance }}
          runbook_url: ${wiki_url}/{{ $labels.alertname }}

  company.blackbox.rules:
    groups:
    - name: company.blackbox.rules
      rules:
      - alert: BlackboxProbeFailed
        expr: probe_success == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: Blackbox probe failed (instance {{ $labels.instance }})
          description: Probe failed\nVALUE = {{ $value }}
          dashboard_url: ${grafana_url}/d/blackbox/blackbox-exporter?var-instance={{ $labels.instance }}
          runbook_url: ${wiki_url}/{{ $labels.alertname }}

      - alert: BlackboxSlowProbe
        expr: avg_over_time(probe_duration_seconds[1m]) > 1
        for: 3m
        labels:
          severity: warning
        annotations:
          summary: Blackbox slow probe (instance {{ $labels.instance }})
          description: "Blackbox probe took more than 1s to complete\nVALUE = {{ $value }}"
          dashboard_url: ${grafana_url}/d/blackbox/blackbox-exporter?var-instance={{ $labels.instance }}
          runbook_url: ${wiki_url}/{{ $labels.alertname }}

# etc....

1
投票

一位同事发现这是完全可能的。这似乎与原始实现中使用的引用有关。以下内容现已使用并正在运行,因此在此发布,希望对其他人有用。

总而言之,

  • {{`{{`}} $labels.instance {{`}}`}}
    ==不好
  • {{`{{$labels.instance}}`}}
    ==
prometheusRule:
  enabled: true
  additionalLabels:
    client: ${client_id}
    cluster: ${cluster}
    environment: ${environment}
    grafana: ${grafana_url}

  rules:
    - alert: BlackboxProbeFailed
      expr: probe_success == 0
      for: 1m
      labels:
        severity: critical
      annotations:
        summary: Blackbox probe failed for {{`{{$labels.instance}}`}}
        description: Probe failed VALUE = {{`{{$value}}`}}
        dashboard_url: https://${grafana_url}/d/blackbox/blackbox-exporter?var-instance={{`{{$labels.instance}}`}}
        runbook_url: ${wiki_url}/BlackboxProbeFailed

    - alert: BlackboxSlowProbe
      expr: avg_over_time(probe_duration_seconds[1m]) > 1
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: Blackbox slow probe for {{`{{$labels.instance}}`}}
        description: Blackbox probe took more than 1s to complete VALUE = {{`{{$value|humanizeDuration}}`}}
        dashboard_url: https://${grafana_url}/d/blackbox/blackbox-exporter?var-instance={{`{{$labels.instance}}`}}
        runbook_url: ${wiki_url}/BlackboxSlowProbe

请忽略任何缺失的变量等


0
投票

您确定标签名称“environment”没有打错吗? 这肯定不会符合您的预期,除非您实际上标记了您的来源。

最好


0
投票

安德鲁的回答很有帮助,谢谢。

additionalPrometheusRulesMap
- 在 kube-prometheus-stack-55.0.0 上运行良好

additionalPrometheusRules
- 已弃用

您还可以查看另一个如何使用的示例

additionalPrometheusRulesMap
https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/values.yaml#L184

© www.soinside.com 2019 - 2024. All rights reserved.