我已经在本地计算机上的 k3d 集群上使用 Helm 图表部署了 promtail、Grafana、Loki 和 AlertManager。我想在 Loki 中制定一些规则,以便如果发生某些事情,应该通知 AlertManager。现在我只尝试了一些简单的规则,只是为了检查它是否有效。
我的洛基版本:
{"version":"2.6.1","revision":"6bd05c9a4","branch":"HEAD","buildUser":"root@ea1e89b8da02","buildDate":"2022-07-18T08:49:07Z","goVersion":""}
我的 Grafana 版本:
Loki 配置:
loki:
# should loki be deployed on cluster?
enabled: true
image:
repository: grafana/loki
pullPolicy: Always
pullSecrets:
- registry
priorityClassName: normal
resources:
limits:
memory: 3Gi
cpu: 0
requests:
memory: 0
cpu: 0
config:
chunk_store_config:
max_look_back_period: 30d
table_manager:
retention_deletes_enabled: true
retention_period: 30d
query_range:
split_queries_by_interval: 0
parallelise_shardable_queries: false
querier:
max_concurrent: 2048
frontend:
max_outstanding_per_tenant: 4096
compress_responses: true
ingester:
wal:
enabled: true
dir: /tmp/wal
schema_config:
configs:
- from: 2022-12-05
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
storage_config:
boltdb_shipper:
active_index_directory: /tmp/loki/boltdb-shipper-active
cache_location: /tmp/loki/boltdb-shipper-cache
cache_ttl: 24h # Can be increased for faster performance over longer query periods, uses more disk space
shared_store: filesystem
filesystem:
directory: /tmp/loki/chunks
compactor:
working_directory: /tmp/loki/boltdb-shipper-compactor
shared_store: filesystem
ruler:
storage:
type: local
local:
directory: /tmp/loki/rules/
ring:
kvstore:
store: inmemory
rule_path: /tmp/loki/rules-temp
alertmanager_url: http://onprem-kube-prometheus-alertmanager.svc.mylocal-monitoring:9093
enable_api: true
enable_alertmanager_v2: true
write:
extraVolumeMounts:
- name: rules-config
mountPath: /tmp/loki/rules/fake/
extraVolumes:
- name: rules-config
configMap:
name: rules-cfgmap
items:
- key: "rules.yaml"
path: "rules.yaml"
read:
extraVolumeMounts:
- name: rules-config
mountPath: /tmp/loki/rules/fake/
extraVolumes:
- name: rules-config
configMap:
name: rules-cfgmap
items:
- key: "rules.yaml"
path: "rules.yaml"
promtail:
image:
registry: docker
pullPolicy: Always
imagePullSecrets:
- name: registry
priorityClassName: normal
resources:
limits:
memory: 256Mi
cpu: 0
requests:
memory: 0
cpu: 0
livenessProbe:
failureThreshold: 5
httpGet:
path: /ready
port: http-metrics
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
config:
snippets:
pipelineStages:
- cri: {}
common:
- action: replace
source_labels:
- __meta_kubernetes_pod_node_name
target_label: node_name
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: namespace
- action: replace
source_labels:
- __meta_kubernetes_pod_container_name
target_label: container
- action: replace
replacement: /var/log/pods/*$1/*.log
separator: /
source_labels:
- __meta_kubernetes_pod_uid
- __meta_kubernetes_pod_container_name
target_label: __path__
- action: replace
replacement: /var/log/pods/*$1/*.log
regex: true/(.*)
separator: /
source_labels:
- __meta_kubernetes_pod_annotationpresent_kubernetes_io_config_hash
- __meta_kubernetes_pod_annotation_kubernetes_io_config_hash
- __meta_kubernetes_pod_container_name
target_label: __path__
monitoring:
enabled: false
networkPolicies:
enabled: false
问题是,当我想检查规则时,做
curl -X GET localhost:3100/loki/api/v1/rules
它会向我显示:unable to read rule dir /tmp/loki/rules/fake: open /tmp/loki/rules/fake: no such file or directory
。
看来是找不到规则文件了。
我也尝试像这样更改配置:
write:
extraVolumeMounts:
- name: rules-conf
mountPath: /tmp/loki/rules/fake/rules.yaml
extraVolumes:
- name: rules-conf
read:
extraVolumeMounts:
- name: rules-conf
mountPath: /tmp/loki/rules/fake/rules.yaml
extraVolumes:
- name: rules-conf
还有我的配置图:
---
apiVersion: v1
kind: ConfigMap
metadata:
name: rules-cfgmap
namespace: mylocal-monitoring
data:
rules.yaml: |
groups:
- name: PrometheusAlertsGroup
rules:
- alert: test1
expr: |
1 > 0
for: 0m
labels:
severity: critical
annotations:
summary: TEST: testing test
description: test
以及规则文件:
groups:
- name: PrometheusAlertsGroup
rules:
- alert: test1
expr: |
1 > 0
for: 0m
labels:
severity: critical
annotations:
summary: TEST: testing test
description: test
但是问题是一样的。有什么想法吗?
当我手动创建
/tmp/loki/rules/fake/rules.yaml
时,它终于起作用了,但这不是手动创建它的重点。
您可以分享您的最终 helm 值以及 configmap 和 Rules.yaml 吗?我也遇到同样的问题