我希望能够指定我的所有规则,例如
prometheus-blackbox-exporter
,因此已将其添加到 rules-mine.yaml
并使用 进行部署
helm upgrade --install -n monitoring blackbox -f values.yaml -f rules-mine.yaml .
我看不到 http://localhost:9090/rules 中列出的任何规则,并且似乎没有任何内容被评估为没有警报......我需要以 IaC 的方式完成所有操作,并以自动化的方式通过 terraform 进行部署。
rules-mine.yaml
文件包含:
prometheusRule:
enabled: true
namespace: monitoring
additionalLabels:
team: foxtrot_blackbox
environment: production
cluster: cluster
namespace: namespace_x
namespace: "monitoring"
rules:
- alert: BlackboxProbeFailed
expr: probe_success == 0
for: 0m
labels:
severity: critical
annotations:
summary: Blackbox probe failed (instance {{`{{`}} $labels.instance {{`}}`}})
description: "Probe failed\n VALUE = {{`{{`}} $value {{`}}`}}"
- alert: BlackboxSlowProbe
expr: avg_over_time(probe_duration_seconds[1m]) > 1
for: 1m
labels:
severity: warning
annotations:
summary: Blackbox slow probe (instance {{`{{`}} $labels.instance {{`}}`}})
description: "Blackbox probe took more than 1s to complete\n VALUE = {{`{{`}} $value {{`}}`}}"
感谢您的帮助......
我发现的最好方法似乎是将导出器规则添加到
kube-prometheus-stack
values.yaml
文件(我实际上创建了一个单独的 rules.yaml
文件)并将其提供给 helm:
helm upgrade --install -n monitoring prometheus --create-namespace -f values-mine.yaml -f rules-mine.yaml prometheus-community/kube-prometheus-stack
然后按照我想要的方式选择所有规则,这似乎是一个不错的解决方案。但我仍然希望它们与出口商分组 - 如果我找到解决方案,我会再次发布。
additionalPrometheusRulesMap:
prometheus.rules:
groups:
- name: company.prometheus.rules
rules:
- alert: PrometheusNotificationsBacklog
expr: min_over_time(prometheus_notifications_queue_length[10m]) > 0
for: 0m
labels:
severity: warning
annotations:
summary: Prometheus notifications backlog (instance {{ $labels.instance }})
description: The Prometheus notification queue has not been empty for 10 minutes\nVALUE = {{ $value }}
dashboard_url: ${grafana_url}/d/blackbox/blackbox-exporter?var-instance={{ $labels.instance }}
runbook_url: ${wiki_url}/{{ $labels.alertname }}
company.blackbox.rules:
groups:
- name: company.blackbox.rules
rules:
- alert: BlackboxProbeFailed
expr: probe_success == 0
for: 1m
labels:
severity: critical
annotations:
summary: Blackbox probe failed (instance {{ $labels.instance }})
description: Probe failed\nVALUE = {{ $value }}
dashboard_url: ${grafana_url}/d/blackbox/blackbox-exporter?var-instance={{ $labels.instance }}
runbook_url: ${wiki_url}/{{ $labels.alertname }}
- alert: BlackboxSlowProbe
expr: avg_over_time(probe_duration_seconds[1m]) > 1
for: 3m
labels:
severity: warning
annotations:
summary: Blackbox slow probe (instance {{ $labels.instance }})
description: "Blackbox probe took more than 1s to complete\nVALUE = {{ $value }}"
dashboard_url: ${grafana_url}/d/blackbox/blackbox-exporter?var-instance={{ $labels.instance }}
runbook_url: ${wiki_url}/{{ $labels.alertname }}
# etc....
一位同事发现这是完全可能的。这似乎与原始实现中使用的引用有关。以下内容现已使用并正在运行,因此在此发布,希望对其他人有用。
总而言之,
{{`{{`}} $labels.instance {{`}}`}}
==不好{{`{{$labels.instance}}`}}
==好prometheusRule:
enabled: true
additionalLabels:
client: ${client_id}
cluster: ${cluster}
environment: ${environment}
grafana: ${grafana_url}
rules:
- alert: BlackboxProbeFailed
expr: probe_success == 0
for: 1m
labels:
severity: critical
annotations:
summary: Blackbox probe failed for {{`{{$labels.instance}}`}}
description: Probe failed VALUE = {{`{{$value}}`}}
dashboard_url: https://${grafana_url}/d/blackbox/blackbox-exporter?var-instance={{`{{$labels.instance}}`}}
runbook_url: ${wiki_url}/BlackboxProbeFailed
- alert: BlackboxSlowProbe
expr: avg_over_time(probe_duration_seconds[1m]) > 1
for: 2m
labels:
severity: warning
annotations:
summary: Blackbox slow probe for {{`{{$labels.instance}}`}}
description: Blackbox probe took more than 1s to complete VALUE = {{`{{$value|humanizeDuration}}`}}
dashboard_url: https://${grafana_url}/d/blackbox/blackbox-exporter?var-instance={{`{{$labels.instance}}`}}
runbook_url: ${wiki_url}/BlackboxSlowProbe
请忽略任何缺失的变量等
您确定标签名称“environment”没有打错吗? 这肯定不会符合您的预期,除非您实际上标记了您的来源。
最好
安德鲁的回答很有帮助,谢谢。
additionalPrometheusRulesMap
- 在 kube-prometheus-stack-55.0.0 上运行良好
additionalPrometheusRules
- 已弃用
您还可以查看另一个如何使用的示例
additionalPrometheusRulesMap
https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/values.yaml#L184