我正在使用
kube-prometheus-stack
,您在下面看到的 yaml 片段是 PrometheusRule
定义的一部分。
这是一个完全假设的场景,是我能想到的最简单的场景,可以说明我的观点。
考虑到这种指标:
cpu_usage{job="job-1", must_be_lower_than="50"} 33.72
cpu_usage{job="job-2", must_be_lower_than="80"} 56.89
# imagine there are plenty more lines here
# with various different values for the must_be_lower_than label
# ...
我想要检查标签
must_be_lower_than
和警报的警报。像这样的东西(这并不像现在写的那样工作,只是想演示一下):
alert: CpuUsageTooHigh
annotations:
message: 'On job {{ $labels.job }}, the cpu usage has been above {{ $labels.must_be_lower_than }}% for 5 minutes.'
expr: cpu_usage > $must_be_lower_than
for: 5m
P.S 我已经知道我可以像这样定义警报:
alert: CpuUsageTooHigh50
annotations:
message: 'On job {{ $labels.job }}, the cpu usage has been above 50% for 5 minutes.'
expr: cpu_usage{must_be_lower_than="50"} > 50
for: 5m
---
alert: CpuUsageTooHigh80
annotations:
message: 'On job {{ $labels.job }}, the cpu usage has been above 80% for 5 minutes.'
expr: cpu_usage{must_be_lower_than="80"} > 80
for: 5m
这不是我想要的,因为我必须手动为 must_be_lower_than
标签的各种值中的
some定义警报。
目前 Prometheus 中还没有办法拥有这种“模板”。
获得接近结果的唯一方法是使用定义标签最大值的记录规则:
rules:
- record: max_cpu_usage
expr: vector(50)
labels:
must_be_lower_than:"50"
- record: max_cpu_usage
expr: vector(80)
labels:
must_be_lower_than:"80"
# ... other possible values
然后在您的警报规则中使用它:
alert: CpuUsageTooHigh
annotations:
message: 'On job {{ $labels.job }}, the cpu usage has been above {{ $labels.must_be_lower_than}}% for 5 minutes.'
expr: cpu_usage > ON(must_be_lower_than) GROUP_LEFT max_cpu_usage
for: 5m