我的 Prometheus/Alertmanager(0.26.0) 位于虚拟机上的 Docker Compose 中。
我的问题大多数时候是警报没有完全显示 Slack。
假设我收到一个良好的警报,然后当我运行 docker-compose restart 并更改为某个实例时,我知道它失败并触发警报,它不会为我提供我需要的数据。
我认为我的警报格式不是问题,因为您可以看到第一个通知显示所有数据,但其他通知都丢失了数据。
有什么想法/建议如何让我的警报在每个松弛通知上完全显示?
松弛消息:
警报管理器用户界面
Alertmanager 调试日志。
普罗米修斯.yml
global:
scrape_interval: 10s
evaluation_interval: 5s
scrape_configs:
- job_name: My Hosts
metrics_path: /probe
params:
module: [icmp]
file_sd_configs:
- files:
- '/etc/prometheus/targets.yml'
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox:9115
rule_files:
- "/etc/prometheus/rules/my_rules.yml"
alerting:
alertmanagers:
- static_configs:
- targets:
- 'alertmanager:9093'
my_targets.yml
- targets: ['10.113.204.3']
labels:
instance_name: '44-base-ws03.dddemo3.com'
cluster_name: '44-Test'
我的规则.yml
groups:
- name: Instancess
rules:
- alert: InstanceDown
expr: probe_success == 0
for: 10s
labels:
severity: CRITICAL
instance: "{{ $labels.instance }}"
job: "{{ $labels.job }}"
instance_name: "{{ $labels.instance_name }}"
cluster_name: "{{ $labels.cluster_name }}"
annotations:
description: '{{ $labels.instance }} has been down for more than 5 minutes.'
summary: 'Instance: {{ $labels.instance }} is down'
alertmanager.yml
global:
slack_api_url: 'my_slack_url'
route:
receiver: 'slack-notifications'
group_by: ['alertname']
group_interval: 1m
group_wait: 20s
repeat_interval: 2m
receivers:
- name: 'slack-notifications'
slack_configs:
- channel: '#se_goes_alerts_dev'
send_resolved: true
title: ':bullhorn-slack: {{ .CommonLabels.severity | toUpper }}: {{ .CommonLabels.alertname }} {{ .CommonLabels.instance }} - {{ .CommonLabels.job }}'
text: |
Description: {{ .CommonAnnotations.description }}
Summary: {{ .CommonAnnotations.summary }}
Instance Name: {{ .CommonLabels.instance_name }}
cluster_name: {{ .CommonLabels.cluster_name }}
instance: {{ .CommonLabels.instance }}
docker-compose.yml
version: '3'
services:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
volumes:
- ./prometheus:/etc/prometheus
ports:
- '9090:9090'
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--web.enable-remote-write-receiver'
networks:
- my-network
alertmanager:
image: prom/alertmanager:latest
container_name: alertmanager
command:
- '--config.file=/etc/alertmanager/alertmanager.yml'
- '--storage.path=/alertmanager'
- '--log.level=debug'
ports:
- '9093:9093'
volumes:
- ./alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml
- ./alertmanager:/etc/alertmanager
networks:
- my-network
depends_on:
- prometheus
我尝试用
--log-level: debug
进行调试,并且检查了不同的github问题,但找不到答案。
使用alertmanager为我生成Slack警报与通知结果不一致。
非常感谢任何帮助。
我面临一些问题。你成功了吗?