我正在尝试实现VMRule,当检测到名称包含“tmp”子字符串的某些pod、部署、守护进程集、状态集或命名空间时,它会发出警报。但我不确定这是否是解决上述任务的正确表达方式。也许有人有想法?
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMRule
metadata:
name: tmp-instance-detected
labels:
project: main
spec:
groups:
- name: vm-health
rules:
- alert: TmpInstanceDetected
expr: changes(process_start_time_seconds{job=~".*tmp.*", kubernetes="*"}[15m]) > 0
labels:
severity: S3
annotations:
summary: "Tmp instance detected: {{ $labels.job }} (pod {{ $labels.pod }})"
description: "A job or instance with 'tmp' in its name (instance {{ $labels.instance }}, pod {{ $labels.pod }}) has been detected."
表情看起来不错,除了
kubernetes="*"
。你确定它应该是这样的,而不是像kubernetes=~".*"
?
如果任何值中包含
tmp
单词的新作业将写入 VictoriaMetrics,则表达式将被触发。它将在 15 分钟内继续返回 > 0 值。