基于指标值与其自身标签值之一进行比较的警报

Question

我正在使用

kube-prometheus-stack

，您在下面看到的 yaml 片段是

PrometheusRule

定义的一部分。

这是一个完全假设的场景，是我能想到的最简单的场景，可以说明我的观点。

考虑到这种指标：

cpu_usage{job="job-1", must_be_lower_than="50"} 33.72
cpu_usage{job="job-2", must_be_lower_than="80"} 56.89
# imagine there are plenty more lines here
# with various different values for the must_be_lower_than label
# ...

我想要检查标签

must_be_lower_than

和警报的警报。像这样的东西（这并不像现在写的那样工作，只是想演示一下）：

alert: CpuUsageTooHigh
annotations:
  message: 'On job {{ $labels.job }}, the cpu usage has been above {{ $labels.must_be_lower_than }}% for 5 minutes.'
expr: cpu_usage > $must_be_lower_than
for: 5m

P.S 我已经知道我可以像这样定义警报：

alert: CpuUsageTooHigh50
annotations:
  message: 'On job {{ $labels.job }}, the cpu usage has been above 50% for 5 minutes.'
expr: cpu_usage{must_be_lower_than="50"} > 50
for: 5m
---
alert: CpuUsageTooHigh80
annotations:
  message: 'On job {{ $labels.job }}, the cpu usage has been above 80% for 5 minutes.'
expr: cpu_usage{must_be_lower_than="80"} > 80
for: 5m

这不是我想要的，因为我必须手动为 must_be_lower_than 标签的各种值中的

some

定义警报。

Answer 1

目前 Prometheus 中还没有办法拥有这种“模板”。

获得接近结果的唯一方法是使用定义标签最大值的记录规则：

rules:
- record: max_cpu_usage
  expr: vector(50)
  labels:
    must_be_lower_than:"50"
- record: max_cpu_usage
  expr: vector(80)
  labels:
    must_be_lower_than:"80"
# ... other possible values

然后在您的警报规则中使用它：

alert: CpuUsageTooHigh
annotations:
  message: 'On job {{ $labels.job }}, the cpu usage has been above {{ $labels.must_be_lower_than}}% for 5 minutes.'
expr: cpu_usage > ON(must_be_lower_than) GROUP_LEFT max_cpu_usage
for: 5m

基于指标值与其自身标签值之一进行比较的警报

问题描述投票：0回答：1

1个回答

最新问题

基于指标值与其自身标签值之一进行比较的警报

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1