我正在尝试在虚拟机虚拟机上使用 docker compose 设置远程写入 Thanos 接收器,但由于某种原因,接收器 api 端点在 Prometheus 发送 post 请求时就会崩溃:
错误 500 消息:超出内容截止日期
这显然意味着某个地方超时了
我做了什么:
尝试增加普罗米修斯配置上的远程写入超时值。没成功
添加了thanos接收容器的资源规范。也没有用。
PS-我在不同的 docker 网络中也安装了 docker compose 的 Prometheus,但我将 Prometheus 容器加入/连接到了thanos docker 网络。
灭霸接收器日志
thanos-receiver | ts=2024-11-13T17:23:56.550008182Z caller=main.go:77 level=debug msg="maxprocs: Updating GOMAXPROCS=[2]: determined from CPU quota"
thanos-receiver | ts=2024-11-13T17:23:56.550415117Z caller=receive.go:137 level=info component=receive mode=RouterOnly msg="running receive"
thanos-receiver | ts=2024-11-13T17:23:56.550436609Z caller=options.go:26 level=info component=receive protocol=HTTP msg="disabled TLS, key and cert must be set to enable"
thanos-receiver | ts=2024-11-13T17:23:56.55051533Z caller=receive.go:747 level=info component=receive msg="default tenant data dir already present, not attempting to migrate storage"
thanos-receiver | ts=2024-11-13T17:23:56.550591853Z caller=handler.go:148 level=info component=receive component=receive-handler msg="Starting receive handler with async forward workers" workers=5
thanos-receiver | ts=2024-11-13T17:23:56.550728876Z caller=receive.go:292 level=debug component=receive msg="setting up hashring"
thanos-receiver | ts=2024-11-13T17:23:56.550873095Z caller=receive.go:299 level=debug component=receive msg="setting up HTTP server"
thanos-receiver | ts=2024-11-13T17:23:56.550896668Z caller=receive.go:317 level=debug component=receive msg="setting up gRPC server"
thanos-receiver | ts=2024-11-13T17:23:56.550907324Z caller=options.go:26 level=info component=receive protocol=gRPC msg="disabled TLS, key and cert must be set to enable"
thanos-receiver | ts=2024-11-13T17:23:56.551241139Z caller=receive.go:389 level=debug component=receive msg="setting up receive HTTP handler"
thanos-receiver | ts=2024-11-13T17:23:56.551257544Z caller=receive.go:418 level=debug component=receive msg="setting up periodic tenant pruning"
thanos-receiver | ts=2024-11-13T17:23:56.551269263Z caller=receive.go:455 level=info component=receive msg="starting receiver"
thanos-receiver | ts=2024-11-13T17:23:56.55280892Z caller=intrumentation.go:75 level=info component=receive msg="changing probe status" status=healthy
thanos-receiver | ts=2024-11-13T17:23:56.552880529Z caller=http.go:73 level=info component=receive service=http/server component=receive msg="listening for requests and metrics" address=0.0.0.0:10909
thanos-receiver | ts=2024-11-13T17:23:56.553293788Z caller=tls_config.go:313 level=info component=receive service=http/server component=receive msg="Listening on" address=[::]:10909
thanos-receiver | ts=2024-11-13T17:23:56.553767727Z caller=tls_config.go:316 level=info component=receive service=http/server component=receive msg="TLS is disabled." http2=false address=[::]:10909
thanos-receiver | ts=2024-11-13T17:23:56.553524411Z caller=receive.go:376 level=info component=receive msg="listening for StoreAPI and WritableStoreAPI gRPC" address=0.0.0.0:10907
thanos-receiver | ts=2024-11-13T17:23:56.553849437Z caller=intrumentation.go:75 level=info component=receive msg="changing probe status" status=healthy
thanos-receiver | ts=2024-11-13T17:23:56.553552921Z caller=handler.go:407 level=info component=receive component=receive-handler msg="Start listening for connections" address=0.0.0.0:10908
thanos-receiver | ts=2024-11-13T17:23:56.554509516Z caller=handler.go:425 level=info component=receive component=receive-handler msg="Serving plain HTTP" address=0.0.0.0:10908
thanos-receiver | ts=2024-11-13T17:23:56.553732526Z caller=config.go:288 level=debug component=receive component=config-watcher msg="refreshed hashring config"
thanos-receiver | ts=2024-11-13T17:23:56.554569921Z caller=receive.go:546 level=info component=receive msg="Set up hashring for the given hashring config."
thanos-receiver | ts=2024-11-13T17:23:56.554589272Z caller=intrumentation.go:56 level=info component=receive msg="changing probe status" status=ready
thanos-receiver | ts=2024-11-13T17:23:56.554442861Z caller=grpc.go:167 level=info component=receive service=gRPC/server component=receive msg="listening for serving gRPC" address=0.0.0.0:10907
thanos-receiver | ts=2024-11-13T17:24:06.431193468Z caller=handler.go:584 level=debug component=receive component=receive-handler tenant=default-tenant msg="failed to handle request" err="context deadline exceeded"
thanos-receiver | ts=2024-11-13T17:24:06.431293553Z caller=handler.go:595 level=error component=receive component=receive-handler tenant=default-tenant err="context deadline exceeded" msg="internal server error"
thanos-receiver | ts=2024-11-13T17:24:06.432118339Z caller=handler.go:764 level=debug component=receive component=receive-handler tenant=default-tenant msg="request failed, but not needed to achieve quorum" err="forwarding request to endpoint thanos-receiver:10907: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
thanos-receiver | ts=2024-11-13T17:24:06.489258283Z caller=handler.go:1011 level=debug component=receive component=receive-handler msg="failed to handle request" err="context deadline exceeded"
thanos-receiver | ts=2024-11-13T17:24:06.489332376Z caller=handler.go:764 level=debug component=receive component=receive-handler tenant=default-tenant msg="request failed, but not needed to achieve quorum" err="forwarding request to endpoint thanos-receiver:10907: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
thanos-receiver | ts=2024-11-13T17:24:06.520580431Z caller=handler.go:1011 level=debug component=receive component=receive-handler msg="failed to handle request" err="context deadline exceeded"
thanos-receiver | ts=2024-11-13T17:24:06.528166389Z caller=handler.go:764 level=debug component=receive component=receive-handler tenant=default-tenant msg="request failed, but not needed to achieve quorum" err="forwarding request to endpoint thanos-receiver:10907: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
thanos-receiver | ts=2024-11-13T17:24:06.528767937Z caller=handler.go:1011 level=debug component=receive component=receive-handler msg="failed to handle request" err="context deadline exceeded"
thanos-receiver | ts=2024-11-13T17:24:06.528783156Z caller=handler.go:764 level=debug component=receive component=receive-handler tenant=default-tenant msg="request failed, but not needed to achieve quorum" err="forwarding request to endpoint thanos-receiver:10907: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
thanos-receiver | ts=2024-11-13T17:24:06.53874829Z caller=handler.go:1011 level=debug component=receive component=receive-handler msg="failed to handle request" err="context deadline exceeded"
thanos-receiver | ts=2024-11-13T17:24:06.538775407Z caller=handler.go:764 level=debug component=receive component=receive-handler tenant=default-tenant msg="request failed, but not needed to achieve quorum" err="forwarding request to endpoint thanos-receiver:10907: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
thanos-receiver | ts=2024-11-13T17:24:06.539757427Z caller=handler.go:1011 level=debug component=receive component=receive-handler msg="failed to handle request" err="context deadline exceeded"
thanos-receiver | ts=2024-11-13T17:24:06.539774847Z caller=handler.go:764 level=debug component=receive component=receive-handler tenant=default-tenant msg="request failed, but not needed to achieve quorum" err="forwarding request to endpoint thanos-receiver:10907: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
普罗米修斯日志
prometheus | ts=2024-11-13T17:31:18.019Z caller=head.go:714 level=info component=tsdb msg="On-disk memory mappable chunks replay completed" duration=122.273211ms
prometheus | ts=2024-11-13T17:31:18.022Z caller=head.go:722 level=info component=tsdb msg="Replaying WAL, this may take a while"
prometheus | ts=2024-11-13T17:31:18.127Z caller=head.go:759 level=info component=tsdb msg="WAL checkpoint loaded"
prometheus | ts=2024-11-13T17:31:18.195Z caller=head.go:794 level=info component=tsdb msg="WAL segment loaded" segment=71 maxSegment=75
prometheus | ts=2024-11-13T17:31:18.224Z caller=head.go:794 level=info component=tsdb msg="WAL segment loaded" segment=72 maxSegment=75
prometheus | ts=2024-11-13T17:31:18.668Z caller=head.go:794 level=info component=tsdb msg="WAL segment loaded" segment=73 maxSegment=75
prometheus | ts=2024-11-13T17:31:19.040Z caller=head.go:794 level=info component=tsdb msg="WAL segment loaded" segment=74 maxSegment=75
prometheus | ts=2024-11-13T17:31:19.058Z caller=head.go:794 level=info component=tsdb msg="WAL segment loaded" segment=75 maxSegment=75
prometheus | ts=2024-11-13T17:31:19.060Z caller=head.go:831 level=info component=tsdb msg="WAL replay completed" checkpoint_replay_duration=110.948691ms wal_replay_duration=927.474664ms wbl_replay_duration=231ns chunk_snapshot_load_duration=0s mmap_chunk_replay_duration=122.273211ms total_replay_duration=1.163680528s
prometheus | ts=2024-11-13T17:31:19.100Z caller=main.go:1218 level=info fs_type=EXT4_SUPER_MAGIC
prometheus | ts=2024-11-13T17:31:19.100Z caller=main.go:1221 level=info msg="TSDB started"
prometheus | ts=2024-11-13T17:31:19.101Z caller=main.go:1404 level=info msg="Loading configuration file" filename=/etc/prometheus/prometheus.yaml
prometheus | ts=2024-11-13T17:31:19.111Z caller=dedupe.go:112 component=remote level=info remote_name=84794c url=http://192.168.2.237:10908/api/v1/receive msg="Starting WAL watcher" queue=84794c
prometheus | ts=2024-11-13T17:31:19.114Z caller=dedupe.go:112 component=remote level=info remote_name=84794c url=http://192.168.2.237:10908/api/v1/receive msg="Starting scraped metadata watcher"
prometheus | ts=2024-11-13T17:31:19.114Z caller=dedupe.go:112 component=remote level=info remote_name=84794c url=http://192.168.2.237:10908/api/v1/receive msg="Replaying WAL" queue=84794c
prometheus | ts=2024-11-13T17:31:19.122Z caller=main.go:1441 level=info msg="updated GOGC" old=100 new=75
prometheus | ts=2024-11-13T17:31:19.123Z caller=main.go:1452 level=info msg="Completed loading of configuration file" filename=/etc/prometheus/prometheus.yaml totalDuration=21.144045ms db_storage=2.928µs remote_storage=7.413605ms web_handler=1.031µs query_engine=359.421µs scrape=6.487095ms scrape_sd=634.937µs notify=1.551µs notify_sd=1.767µs rules=556.627µs tracing=9.265µs
prometheus | ts=2024-11-13T17:31:19.125Z caller=main.go:1182 level=info msg="Server is ready to receive web requests."
prometheus | ts=2024-11-13T17:31:19.125Z caller=manager.go:164 level=info component="rule manager" msg="Starting rule manager..."
prometheus | ts=2024-11-13T17:31:26.970Z caller=dedupe.go:112 component=remote level=info remote_name=84794c url=http://192.168.2.237:10908/api/v1/receive msg="Done replaying WAL" duration=7.855803409s
prometheus | ts=2024-11-13T17:31:34.161Z caller=dedupe.go:112 component=remote level=warn remote_name=84794c url=http://192.168.2.237:10908/api/v1/receive msg="Failed to send batch, retrying" err="server returned HTTP status 500 Internal Server Error: context deadline exceeded\n"
prometheus | ts=2024-11-13T17:32:37.419Z caller=dedupe.go:112 component=remote level=warn remote_name=84794c url=http://192.168.2.237:10908/api/v1/receive msg="Failed to send batch, retrying" err="server returned HTTP status 500 Internal Server Error: context deadline exceeded\n"
prometheus | ts=2024-11-13T17:33:38.505Z caller=dedupe.go:112 component=remote level=warn remote_name=84794c url=http://192.168.2.237:10908/api/v1/receive msg="Failed to send batch, retrying" err="server returned HTTP status 500 Internal Server Error: context deadline exceeded\n"
Thanos docker 撰写文件
services:
thanos-receiver:
container_name: thanos-receiver
image: thanosio/thanos:v0.36.0
deploy:
resources:
limits:
cpus: '2.0'
memory: 4G
reservations:
cpus: '1.0'
memory: 2G
command:
- receive
- --grpc-address=0.0.0.0:10907 # Use gRPC for communication
- --http-address=0.0.0.0:10909 # Optional: Enable HTTP if needed
- --remote-write.address=0.0.0.0:10908
- --log.level=debug
- --tsdb.path=/data
- --receive.hashrings-file=/etc/thanos/hashring.json
- --objstore.config-file=/etc/thanos/minio.yaml
- --label=receive_replica="01"
ports:
- "10907:10907"
- "10909:10909"
- "10908:10908"
- "19391:19391"
volumes:
- ./config/hashring.json:/etc/thanos/hashring.json
- ./config/minio.yaml:/etc/thanos/minio.yaml
- ./data/receiver:/data
networks:
- thanos-net
thanos-store:
container_name: thanos-store
image: thanosio/thanos:v0.36.0
command:
- store
- --grpc-address=0.0.0.0:10901
- --objstore.config-file=/etc/thanos/minio.yaml
- --data-dir=/data
ports:
- "10901:10901"
volumes:
- ./config/minio.yaml:/etc/thanos/minio.yaml
- ./data/store:/data
networks:
- thanos-net
thanos-querier:
container_name: thanos-querier
image: thanosio/thanos:v0.36.0
command:
- query
- --http-address=0.0.0.0:9090
- --endpoint=thanos-store:10901
ports:
- "10904:10904" # Query HTTP port
- "9999:9090"
networks:
- thanos-net
thanos-query-frontend:
container_name: thanos-query-frontend
image: thanosio/thanos:v0.36.0
command:
- query-frontend
- --http-address=0.0.0.0:9095
- --query-frontend.downstream-url=http://thanos-querier:9090
- --log.level=debug
ports:
- "10905:9095" # Query Frontend HTTP port
networks:
- thanos-net
networks:
thanos-net:
driver: bridge
Prometheus docker 撰写文件
services:
grafana:
image: grafana/grafana
container_name: grafana
restart: unless-stopped
environment:
- GF_INSTALL_PLUGINS=grafana-clock-panel
ports:
- '3000:3000'
volumes:
- grafana-storage:/var/lib/grafana
prometheus:
image: docker.io/prom/prometheus:latest
container_name: prometheus
ports:
- 9090:9090
command: "--config.file=/etc/prometheus/prometheus.yaml"
volumes:
- ./config/prometheus.yaml:/etc/prometheus/prometheus.yaml:ro
- prometheus-data:/prometheus
restart: unless-stopped
node_exporter:
image: quay.io/prometheus/node-exporter:latest
container_name: node_exporter
command:
- '--path.rootfs=/host'
network_mode: host
pid: host
restart: unless-stopped
volumes:
- '/:/host:ro,rslave'
cadvisor:
image: gcr.io/cadvisor/cadvisor:latest
container_name: cadvisor
privileged: true
devices:
- /dev/kmsg
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
- /dev/disk/:/dev/disk:ro
ports:
- "8080:8080"
volumes:
grafana-storage: {}
prometheus-data:
driver: local
hasring.yaml 文件
[
{
"endpoints": [
"thanos-receiver:10907"
]
}
]
普罗米修斯配置文件
global:
scrape_interval: 15s # By default, scrape targets every 15 seconds.
# Attach these labels to any time series or alerts when communicating with
# external systems (federation, remote storage, Alertmanager).
# external_labels:
# monitor: 'codelab-monitor'
# Remote write configuration to send data to Thanos Receiver
remote_write:
- url: 'http://192.168.2.237:10908/api/v1/receive'
remote_timeout: 30s
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
# Example job for node_exporter
- job_name: 'node_exporter'
static_configs:
- targets: ['192.168.2.237:9100']
# Example job for cadvisor
- job_name: 'cadvisor'
static_configs:
- targets: ['192.168.2.237:8080']
- job_name: 'federate'
scrape_interval: 15s
honor_labels: true
metrics_path: '/federate'
params:
'match[]':
- '{job="prometheus"}'
- '{__name__=~"job:.*"}'
static_configs:
- targets:
- '192.168.2.92:30081'
我通过更改 hashring.yaml 文件中的端点解决了该问题。
thanos-receiver:10907
-> 127.0.0.1:10907
效果很好。