普罗米修斯灭霸接收器错误:超出内容截止日期

问题描述 投票:0回答:1

我正在尝试在虚拟机虚拟机上使用 docker compose 设置远程写入 Thanos 接收器,但由于某种原因,接收器 api 端点在 Prometheus 发送 post 请求时就会崩溃:

错误 500 消息:超出内容截止日期

这显然意味着某个地方超时了

我做了什么:

尝试增加普罗米修斯配置上的远程写入超时值。没成功

添加了thanos接收容器的资源规范。也没有用。

PS-我在不同的 docker 网络中也安装了 docker compose 的 Prometheus,但我将 Prometheus 容器加入/连接到了thanos docker 网络。

灭霸接收器日志

thanos-receiver  | ts=2024-11-13T17:23:56.550008182Z caller=main.go:77 level=debug msg="maxprocs: Updating GOMAXPROCS=[2]: determined from CPU quota"
thanos-receiver  | ts=2024-11-13T17:23:56.550415117Z caller=receive.go:137 level=info component=receive mode=RouterOnly msg="running receive"
thanos-receiver  | ts=2024-11-13T17:23:56.550436609Z caller=options.go:26 level=info component=receive protocol=HTTP msg="disabled TLS, key and cert must be set to enable"
thanos-receiver  | ts=2024-11-13T17:23:56.55051533Z caller=receive.go:747 level=info component=receive msg="default tenant data dir already present, not attempting to migrate storage"
thanos-receiver  | ts=2024-11-13T17:23:56.550591853Z caller=handler.go:148 level=info component=receive component=receive-handler msg="Starting receive handler with async forward workers" workers=5
thanos-receiver  | ts=2024-11-13T17:23:56.550728876Z caller=receive.go:292 level=debug component=receive msg="setting up hashring"
thanos-receiver  | ts=2024-11-13T17:23:56.550873095Z caller=receive.go:299 level=debug component=receive msg="setting up HTTP server"
thanos-receiver  | ts=2024-11-13T17:23:56.550896668Z caller=receive.go:317 level=debug component=receive msg="setting up gRPC server"
thanos-receiver  | ts=2024-11-13T17:23:56.550907324Z caller=options.go:26 level=info component=receive protocol=gRPC msg="disabled TLS, key and cert must be set to enable"
thanos-receiver  | ts=2024-11-13T17:23:56.551241139Z caller=receive.go:389 level=debug component=receive msg="setting up receive HTTP handler"
thanos-receiver  | ts=2024-11-13T17:23:56.551257544Z caller=receive.go:418 level=debug component=receive msg="setting up periodic tenant pruning"
thanos-receiver  | ts=2024-11-13T17:23:56.551269263Z caller=receive.go:455 level=info component=receive msg="starting receiver"
thanos-receiver  | ts=2024-11-13T17:23:56.55280892Z caller=intrumentation.go:75 level=info component=receive msg="changing probe status" status=healthy
thanos-receiver  | ts=2024-11-13T17:23:56.552880529Z caller=http.go:73 level=info component=receive service=http/server component=receive msg="listening for requests and metrics" address=0.0.0.0:10909
thanos-receiver  | ts=2024-11-13T17:23:56.553293788Z caller=tls_config.go:313 level=info component=receive service=http/server component=receive msg="Listening on" address=[::]:10909
thanos-receiver  | ts=2024-11-13T17:23:56.553767727Z caller=tls_config.go:316 level=info component=receive service=http/server component=receive msg="TLS is disabled." http2=false address=[::]:10909
thanos-receiver  | ts=2024-11-13T17:23:56.553524411Z caller=receive.go:376 level=info component=receive msg="listening for StoreAPI and WritableStoreAPI gRPC" address=0.0.0.0:10907
thanos-receiver  | ts=2024-11-13T17:23:56.553849437Z caller=intrumentation.go:75 level=info component=receive msg="changing probe status" status=healthy
thanos-receiver  | ts=2024-11-13T17:23:56.553552921Z caller=handler.go:407 level=info component=receive component=receive-handler msg="Start listening for connections" address=0.0.0.0:10908
thanos-receiver  | ts=2024-11-13T17:23:56.554509516Z caller=handler.go:425 level=info component=receive component=receive-handler msg="Serving plain HTTP" address=0.0.0.0:10908
thanos-receiver  | ts=2024-11-13T17:23:56.553732526Z caller=config.go:288 level=debug component=receive component=config-watcher msg="refreshed hashring config"
thanos-receiver  | ts=2024-11-13T17:23:56.554569921Z caller=receive.go:546 level=info component=receive msg="Set up hashring for the given hashring config."
thanos-receiver  | ts=2024-11-13T17:23:56.554589272Z caller=intrumentation.go:56 level=info component=receive msg="changing probe status" status=ready
thanos-receiver  | ts=2024-11-13T17:23:56.554442861Z caller=grpc.go:167 level=info component=receive service=gRPC/server component=receive msg="listening for serving gRPC" address=0.0.0.0:10907
thanos-receiver  | ts=2024-11-13T17:24:06.431193468Z caller=handler.go:584 level=debug component=receive component=receive-handler tenant=default-tenant msg="failed to handle request" err="context deadline exceeded"
thanos-receiver  | ts=2024-11-13T17:24:06.431293553Z caller=handler.go:595 level=error component=receive component=receive-handler tenant=default-tenant err="context deadline exceeded" msg="internal server error"
thanos-receiver  | ts=2024-11-13T17:24:06.432118339Z caller=handler.go:764 level=debug component=receive component=receive-handler tenant=default-tenant msg="request failed, but not needed to achieve quorum" err="forwarding request to endpoint thanos-receiver:10907: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
thanos-receiver  | ts=2024-11-13T17:24:06.489258283Z caller=handler.go:1011 level=debug component=receive component=receive-handler msg="failed to handle request" err="context deadline exceeded"
thanos-receiver  | ts=2024-11-13T17:24:06.489332376Z caller=handler.go:764 level=debug component=receive component=receive-handler tenant=default-tenant msg="request failed, but not needed to achieve quorum" err="forwarding request to endpoint thanos-receiver:10907: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
thanos-receiver  | ts=2024-11-13T17:24:06.520580431Z caller=handler.go:1011 level=debug component=receive component=receive-handler msg="failed to handle request" err="context deadline exceeded"
thanos-receiver  | ts=2024-11-13T17:24:06.528166389Z caller=handler.go:764 level=debug component=receive component=receive-handler tenant=default-tenant msg="request failed, but not needed to achieve quorum" err="forwarding request to endpoint thanos-receiver:10907: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
thanos-receiver  | ts=2024-11-13T17:24:06.528767937Z caller=handler.go:1011 level=debug component=receive component=receive-handler msg="failed to handle request" err="context deadline exceeded"
thanos-receiver  | ts=2024-11-13T17:24:06.528783156Z caller=handler.go:764 level=debug component=receive component=receive-handler tenant=default-tenant msg="request failed, but not needed to achieve quorum" err="forwarding request to endpoint thanos-receiver:10907: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
thanos-receiver  | ts=2024-11-13T17:24:06.53874829Z caller=handler.go:1011 level=debug component=receive component=receive-handler msg="failed to handle request" err="context deadline exceeded"
thanos-receiver  | ts=2024-11-13T17:24:06.538775407Z caller=handler.go:764 level=debug component=receive component=receive-handler tenant=default-tenant msg="request failed, but not needed to achieve quorum" err="forwarding request to endpoint thanos-receiver:10907: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
thanos-receiver  | ts=2024-11-13T17:24:06.539757427Z caller=handler.go:1011 level=debug component=receive component=receive-handler msg="failed to handle request" err="context deadline exceeded"
thanos-receiver  | ts=2024-11-13T17:24:06.539774847Z caller=handler.go:764 level=debug component=receive component=receive-handler tenant=default-tenant msg="request failed, but not needed to achieve quorum" err="forwarding request to endpoint thanos-receiver:10907: rpc error: code = DeadlineExceeded desc = context deadline exceeded"

普罗米修斯日志

prometheus  | ts=2024-11-13T17:31:18.019Z caller=head.go:714 level=info component=tsdb msg="On-disk memory mappable chunks replay completed" duration=122.273211ms
prometheus  | ts=2024-11-13T17:31:18.022Z caller=head.go:722 level=info component=tsdb msg="Replaying WAL, this may take a while"
prometheus  | ts=2024-11-13T17:31:18.127Z caller=head.go:759 level=info component=tsdb msg="WAL checkpoint loaded"
prometheus  | ts=2024-11-13T17:31:18.195Z caller=head.go:794 level=info component=tsdb msg="WAL segment loaded" segment=71 maxSegment=75
prometheus  | ts=2024-11-13T17:31:18.224Z caller=head.go:794 level=info component=tsdb msg="WAL segment loaded" segment=72 maxSegment=75
prometheus  | ts=2024-11-13T17:31:18.668Z caller=head.go:794 level=info component=tsdb msg="WAL segment loaded" segment=73 maxSegment=75
prometheus  | ts=2024-11-13T17:31:19.040Z caller=head.go:794 level=info component=tsdb msg="WAL segment loaded" segment=74 maxSegment=75
prometheus  | ts=2024-11-13T17:31:19.058Z caller=head.go:794 level=info component=tsdb msg="WAL segment loaded" segment=75 maxSegment=75
prometheus  | ts=2024-11-13T17:31:19.060Z caller=head.go:831 level=info component=tsdb msg="WAL replay completed" checkpoint_replay_duration=110.948691ms wal_replay_duration=927.474664ms wbl_replay_duration=231ns chunk_snapshot_load_duration=0s mmap_chunk_replay_duration=122.273211ms total_replay_duration=1.163680528s
prometheus  | ts=2024-11-13T17:31:19.100Z caller=main.go:1218 level=info fs_type=EXT4_SUPER_MAGIC
prometheus  | ts=2024-11-13T17:31:19.100Z caller=main.go:1221 level=info msg="TSDB started"
prometheus  | ts=2024-11-13T17:31:19.101Z caller=main.go:1404 level=info msg="Loading configuration file" filename=/etc/prometheus/prometheus.yaml
prometheus  | ts=2024-11-13T17:31:19.111Z caller=dedupe.go:112 component=remote level=info remote_name=84794c url=http://192.168.2.237:10908/api/v1/receive msg="Starting WAL watcher" queue=84794c
prometheus  | ts=2024-11-13T17:31:19.114Z caller=dedupe.go:112 component=remote level=info remote_name=84794c url=http://192.168.2.237:10908/api/v1/receive msg="Starting scraped metadata watcher"
prometheus  | ts=2024-11-13T17:31:19.114Z caller=dedupe.go:112 component=remote level=info remote_name=84794c url=http://192.168.2.237:10908/api/v1/receive msg="Replaying WAL" queue=84794c
prometheus  | ts=2024-11-13T17:31:19.122Z caller=main.go:1441 level=info msg="updated GOGC" old=100 new=75
prometheus  | ts=2024-11-13T17:31:19.123Z caller=main.go:1452 level=info msg="Completed loading of configuration file" filename=/etc/prometheus/prometheus.yaml totalDuration=21.144045ms db_storage=2.928µs remote_storage=7.413605ms web_handler=1.031µs query_engine=359.421µs scrape=6.487095ms scrape_sd=634.937µs notify=1.551µs notify_sd=1.767µs rules=556.627µs tracing=9.265µs
prometheus  | ts=2024-11-13T17:31:19.125Z caller=main.go:1182 level=info msg="Server is ready to receive web requests."
prometheus  | ts=2024-11-13T17:31:19.125Z caller=manager.go:164 level=info component="rule manager" msg="Starting rule manager..."
prometheus  | ts=2024-11-13T17:31:26.970Z caller=dedupe.go:112 component=remote level=info remote_name=84794c url=http://192.168.2.237:10908/api/v1/receive msg="Done replaying WAL" duration=7.855803409s
prometheus  | ts=2024-11-13T17:31:34.161Z caller=dedupe.go:112 component=remote level=warn remote_name=84794c url=http://192.168.2.237:10908/api/v1/receive msg="Failed to send batch, retrying" err="server returned HTTP status 500 Internal Server Error: context deadline exceeded\n"
prometheus  | ts=2024-11-13T17:32:37.419Z caller=dedupe.go:112 component=remote level=warn remote_name=84794c url=http://192.168.2.237:10908/api/v1/receive msg="Failed to send batch, retrying" err="server returned HTTP status 500 Internal Server Error: context deadline exceeded\n"
prometheus  | ts=2024-11-13T17:33:38.505Z caller=dedupe.go:112 component=remote level=warn remote_name=84794c url=http://192.168.2.237:10908/api/v1/receive msg="Failed to send batch, retrying" err="server returned HTTP status 500 Internal Server Error: context deadline exceeded\n"

Thanos docker 撰写文件

services:
  thanos-receiver:
    container_name: thanos-receiver
    image: thanosio/thanos:v0.36.0
    deploy:
      resources:
        limits:
          cpus: '2.0'
          memory: 4G
        reservations:
          cpus: '1.0'
          memory: 2G
    command:
      - receive
      - --grpc-address=0.0.0.0:10907  # Use gRPC for communication
      - --http-address=0.0.0.0:10909  # Optional: Enable HTTP if needed
      - --remote-write.address=0.0.0.0:10908
      - --log.level=debug
      - --tsdb.path=/data
      - --receive.hashrings-file=/etc/thanos/hashring.json
      - --objstore.config-file=/etc/thanos/minio.yaml
      - --label=receive_replica="01"
    ports:
      - "10907:10907"
      - "10909:10909"
      - "10908:10908"
      - "19391:19391"
    volumes:
      - ./config/hashring.json:/etc/thanos/hashring.json
      - ./config/minio.yaml:/etc/thanos/minio.yaml
      - ./data/receiver:/data
    networks:
      - thanos-net

  thanos-store:
    container_name: thanos-store
    image: thanosio/thanos:v0.36.0
    command:
      - store
      - --grpc-address=0.0.0.0:10901
      - --objstore.config-file=/etc/thanos/minio.yaml
      - --data-dir=/data
    ports:
      - "10901:10901"
    volumes:
      - ./config/minio.yaml:/etc/thanos/minio.yaml
      - ./data/store:/data
    networks:
      - thanos-net

  thanos-querier:
    container_name: thanos-querier
    image: thanosio/thanos:v0.36.0
    command:
      - query
      - --http-address=0.0.0.0:9090
      - --endpoint=thanos-store:10901
    ports:
      - "10904:10904"  # Query HTTP port
      - "9999:9090"
    networks:
      - thanos-net

  thanos-query-frontend:
    container_name: thanos-query-frontend
    image: thanosio/thanos:v0.36.0
    command:
      - query-frontend
      - --http-address=0.0.0.0:9095
      - --query-frontend.downstream-url=http://thanos-querier:9090
      - --log.level=debug
    ports:
      - "10905:9095" # Query Frontend HTTP port
    networks:
      - thanos-net

networks:
  thanos-net:
    driver: bridge

Prometheus docker 撰写文件

services:
  grafana:
    image: grafana/grafana
    container_name: grafana
    restart: unless-stopped
    environment:
     - GF_INSTALL_PLUGINS=grafana-clock-panel
    ports:
      - '3000:3000'
    volumes:
      - grafana-storage:/var/lib/grafana
  prometheus:
    image: docker.io/prom/prometheus:latest
    container_name: prometheus
    ports:
      - 9090:9090
    command: "--config.file=/etc/prometheus/prometheus.yaml"
    volumes:
      - ./config/prometheus.yaml:/etc/prometheus/prometheus.yaml:ro
      - prometheus-data:/prometheus
    restart: unless-stopped
  node_exporter:
    image: quay.io/prometheus/node-exporter:latest
    container_name: node_exporter
    command:
      - '--path.rootfs=/host'
    network_mode: host
    pid: host
    restart: unless-stopped
    volumes:
      - '/:/host:ro,rslave'
  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    container_name: cadvisor
    privileged: true
    devices:
      - /dev/kmsg
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
      - /dev/disk/:/dev/disk:ro
    ports:
      - "8080:8080"
volumes:
  grafana-storage: {}
  prometheus-data:
    driver: local

hasring.yaml 文件

[
  {
    "endpoints": [
      "thanos-receiver:10907"
    ]
  }
]

普罗米修斯配置文件

global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  # external_labels:
  #  monitor: 'codelab-monitor'

# Remote write configuration to send data to Thanos Receiver
remote_write:
  - url: 'http://192.168.2.237:10908/api/v1/receive'
    remote_timeout: 30s
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']

  # Example job for node_exporter
  - job_name: 'node_exporter'
    static_configs:
      - targets: ['192.168.2.237:9100']

  # Example job for cadvisor
  - job_name: 'cadvisor'
    static_configs:
      - targets: ['192.168.2.237:8080']

  - job_name: 'federate'
    scrape_interval: 15s

    honor_labels: true
    metrics_path: '/federate'

    params:
      'match[]':
        - '{job="prometheus"}'
        - '{__name__=~"job:.*"}'

    static_configs:
      - targets:
        - '192.168.2.92:30081'
docker-compose prometheus monitoring thanos
1个回答
0
投票

我通过更改 hashring.yaml 文件中的端点解决了该问题。

thanos-receiver:10907
->
127.0.0.1:10907

效果很好。

© www.soinside.com 2019 - 2024. All rights reserved.