无法看到 Prometheus 的 flink 自定义指标

问题描述 投票:0回答:3

我有一个用 scala 编写的 flink 作业,我正在创建一个自定义指标来计算流中的事件数量。该作业部署在 kubernetes 上,我在 prometheus 中看到作业管理器和任务管理器的系统指标。然而,尽管我们在 Flink UI 中看到了自定义指标,但我们在 Prometheus 中看不到自定义指标。以下是自定义指标代码:

    val sampleProcessFunction = new ProcessFunction[String, String] {
    @transient private var counter: Counter = _
    override def open(parameters: Configuration): Unit =
      counter = getRuntimeContext.getMetricGroup.addGroup("abc").counter("streamcounter")

    override def processElement(
                                 value: String,
                                 ctx: ProcessFunction[String, String]#Context,
                                 out: Collector[String]): Unit = {
      
        val result = value.parseJson.toString
        counter.inc()
        out.collect(result)
      
    }
}

  

flink-config.yaml 有以下与 prometheus 相关的条目:

   taskmanager.network.detailed-metrics: true
   metrics.reporter.prom.class:org.apache.flink.metrics.prometheus.PrometheusReporter
   metrics.reporter.prom.port: 8080

不仅是自定义指标,任何遵循路径 taskmanager.job.* 的任务管理器指标都不会在指标端点中公开。当我进入任务管理器 Pod 并像这样对指标端点进行卷曲时:

kubectl exec -it flink-taskmanager-app-7448cdb787-9c48j -- /bin/bash
curl http://localhost:8080/metrics

我只获取与任务管理器相关的状态指标:

# HELP flink_taskmanager_Status_JVM_Memory_Mapped_MemoryUsed MemoryUsed (scope: taskmanager_Status_JVM_Memory_Mapped)
# TYPE flink_taskmanager_Status_JVM_Memory_Mapped_MemoryUsed gauge
flink_taskmanager_Status_JVM_Memory_Mapped_MemoryUsed{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
# HELP flink_taskmanager_Status_Flink_Memory_Managed_Used Used (scope: taskmanager_Status_Flink_Memory_Managed)
# TYPE flink_taskmanager_Status_Flink_Memory_Managed_Used gauge
flink_taskmanager_Status_Flink_Memory_Managed_Used{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
# HELP flink_taskmanager_Status_Shuffle_Netty_UsedMemorySegments UsedMemorySegments (scope: taskmanager_Status_Shuffle_Netty)
# TYPE flink_taskmanager_Status_Shuffle_Netty_UsedMemorySegments gauge
flink_taskmanager_Status_Shuffle_Netty_UsedMemorySegments{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
# HELP flink_taskmanager_Status_Network_TotalMemorySegments TotalMemorySegments (scope: taskmanager_Status_Network)
# TYPE flink_taskmanager_Status_Network_TotalMemorySegments gauge
flink_taskmanager_Status_Network_TotalMemorySegments{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 30037.0
# HELP flink_taskmanager_Status_Shuffle_Netty_AvailableMemory AvailableMemory (scope: taskmanager_Status_Shuffle_Netty)
# TYPE flink_taskmanager_Status_Shuffle_Netty_AvailableMemory gauge
flink_taskmanager_Status_Shuffle_Netty_AvailableMemory{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 9.84252416E8
# HELP flink_taskmanager_Status_JVM_ClassLoader_ClassesLoaded ClassesLoaded (scope: taskmanager_Status_JVM_ClassLoader)
# TYPE flink_taskmanager_Status_JVM_ClassLoader_ClassesLoaded gauge
flink_taskmanager_Status_JVM_ClassLoader_ClassesLoaded{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 11075.0
# HELP flink_taskmanager_Status_JVM_Memory_Metaspace_Max Max (scope: taskmanager_Status_JVM_Memory_Metaspace)
# TYPE flink_taskmanager_Status_JVM_Memory_Metaspace_Max gauge
flink_taskmanager_Status_JVM_Memory_Metaspace_Max{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 2.68435456E8
# HELP flink_taskmanager_Status_Shuffle_Netty_RequestedMemoryUsage RequestedMemoryUsage (scope: taskmanager_Status_Shuffle_Netty)
# TYPE flink_taskmanager_Status_Shuffle_Netty_RequestedMemoryUsage gauge
flink_taskmanager_Status_Shuffle_Netty_RequestedMemoryUsage{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
# HELP flink_taskmanager_Status_Shuffle_Netty_AvailableMemorySegments AvailableMemorySegments (scope: taskmanager_Status_Shuffle_Netty)
# TYPE flink_taskmanager_Status_Shuffle_Netty_AvailableMemorySegments gauge
flink_taskmanager_Status_Shuffle_Netty_AvailableMemorySegments{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 30037.0
# HELP flink_taskmanager_Status_JVM_Memory_Metaspace_Used Used (scope: taskmanager_Status_JVM_Memory_Metaspace)
# TYPE flink_taskmanager_Status_JVM_Memory_Metaspace_Used gauge
flink_taskmanager_Status_JVM_Memory_Metaspace_Used{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 6.5252976E7
# HELP flink_taskmanager_Status_JVM_Memory_NonHeap_Max Max (scope: taskmanager_Status_JVM_Memory_NonHeap)
# TYPE flink_taskmanager_Status_JVM_Memory_NonHeap_Max gauge
flink_taskmanager_Status_JVM_Memory_NonHeap_Max{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 7.80140544E8
# HELP flink_taskmanager_Status_JVM_Memory_Direct_Count Count (scope: taskmanager_Status_JVM_Memory_Direct)
# TYPE flink_taskmanager_Status_JVM_Memory_Direct_Count gauge
flink_taskmanager_Status_JVM_Memory_Direct_Count{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 30065.0
# HELP flink_taskmanager_Status_JVM_Memory_Direct_TotalCapacity TotalCapacity (scope: taskmanager_Status_JVM_Memory_Direct)
# TYPE flink_taskmanager_Status_JVM_Memory_Direct_TotalCapacity gauge
flink_taskmanager_Status_JVM_Memory_Direct_TotalCapacity{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 9.85225216E8
# HELP flink_taskmanager_Status_JVM_GarbageCollector_G1_Old_Generation_Time Time (scope: taskmanager_Status_JVM_GarbageCollector_G1_Old_Generation)
# TYPE flink_taskmanager_Status_JVM_GarbageCollector_G1_Old_Generation_Time gauge
flink_taskmanager_Status_JVM_GarbageCollector_G1_Old_Generation_Time{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
# HELP flink_taskmanager_Status_JVM_Threads_Count Count (scope: taskmanager_Status_JVM_Threads)
# TYPE flink_taskmanager_Status_JVM_Threads_Count gauge
flink_taskmanager_Status_JVM_Threads_Count{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 51.0
# HELP flink_taskmanager_Status_Shuffle_Netty_TotalMemory TotalMemory (scope: taskmanager_Status_Shuffle_Netty)
# TYPE flink_taskmanager_Status_Shuffle_Netty_TotalMemory gauge
flink_taskmanager_Status_Shuffle_Netty_TotalMemory{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 9.84252416E8
# HELP flink_taskmanager_Status_JVM_GarbageCollector_G1_Young_Generation_Time Time (scope: taskmanager_Status_JVM_GarbageCollector_G1_Young_Generation)
# TYPE flink_taskmanager_Status_JVM_GarbageCollector_G1_Young_Generation_Time gauge
flink_taskmanager_Status_JVM_GarbageCollector_G1_Young_Generation_Time{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 55.0
# HELP flink_taskmanager_Status_JVM_ClassLoader_ClassesUnloaded ClassesUnloaded (scope: taskmanager_Status_JVM_ClassLoader)
# TYPE flink_taskmanager_Status_JVM_ClassLoader_ClassesUnloaded gauge
flink_taskmanager_Status_JVM_ClassLoader_ClassesUnloaded{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
# HELP flink_taskmanager_Status_JVM_Memory_Heap_Used Used (scope: taskmanager_Status_JVM_Memory_Heap)
# TYPE flink_taskmanager_Status_JVM_Memory_Heap_Used gauge
flink_taskmanager_Status_JVM_Memory_Heap_Used{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 1.56297264E8
# HELP flink_taskmanager_Status_JVM_CPU_Time Time (scope: taskmanager_Status_JVM_CPU)
# TYPE flink_taskmanager_Status_JVM_CPU_Time gauge
flink_taskmanager_Status_JVM_CPU_Time{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 4.001E10
# HELP flink_taskmanager_Status_JVM_Memory_Direct_MemoryUsed MemoryUsed (scope: taskmanager_Status_JVM_Memory_Direct)
# TYPE flink_taskmanager_Status_JVM_Memory_Direct_MemoryUsed gauge
flink_taskmanager_Status_JVM_Memory_Direct_MemoryUsed{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 9.85225217E8
# HELP flink_taskmanager_Status_Shuffle_Netty_UsedMemory UsedMemory (scope: taskmanager_Status_Shuffle_Netty)
# TYPE flink_taskmanager_Status_Shuffle_Netty_UsedMemory gauge
flink_taskmanager_Status_Shuffle_Netty_UsedMemory{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
# HELP flink_taskmanager_Status_JVM_GarbageCollector_G1_Young_Generation_Count Count (scope: taskmanager_Status_JVM_GarbageCollector_G1_Young_Generation)
# TYPE flink_taskmanager_Status_JVM_GarbageCollector_G1_Young_Generation_Count gauge
flink_taskmanager_Status_JVM_GarbageCollector_G1_Young_Generation_Count{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 3.0
# HELP flink_taskmanager_Status_JVM_Memory_Metaspace_Committed Committed (scope: taskmanager_Status_JVM_Memory_Metaspace)
# TYPE flink_taskmanager_Status_JVM_Memory_Metaspace_Committed gauge
flink_taskmanager_Status_JVM_Memory_Metaspace_Committed{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 6.7375104E7
# HELP flink_taskmanager_Status_JVM_Memory_Heap_Max Max (scope: taskmanager_Status_JVM_Memory_Heap)
# TYPE flink_taskmanager_Status_JVM_Memory_Heap_Max gauge
flink_taskmanager_Status_JVM_Memory_Heap_Max{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 4.429185024E9
# HELP flink_taskmanager_Status_JVM_Memory_NonHeap_Committed Committed (scope: taskmanager_Status_JVM_Memory_NonHeap)
# TYPE flink_taskmanager_Status_JVM_Memory_NonHeap_Committed gauge
flink_taskmanager_Status_JVM_Memory_NonHeap_Committed{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 9.8787328E7
# HELP flink_taskmanager_Status_JVM_Memory_NonHeap_Used Used (scope: taskmanager_Status_JVM_Memory_NonHeap)
# TYPE flink_taskmanager_Status_JVM_Memory_NonHeap_Used gauge
flink_taskmanager_Status_JVM_Memory_NonHeap_Used{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 9.4597576E7
# HELP flink_taskmanager_Status_Shuffle_Netty_TotalMemorySegments TotalMemorySegments (scope: taskmanager_Status_Shuffle_Netty)
# TYPE flink_taskmanager_Status_Shuffle_Netty_TotalMemorySegments gauge
flink_taskmanager_Status_Shuffle_Netty_TotalMemorySegments{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 30037.0
# HELP flink_taskmanager_Status_Flink_Memory_Managed_Total Total (scope: taskmanager_Status_Flink_Memory_Managed)
# TYPE flink_taskmanager_Status_Flink_Memory_Managed_Total gauge
flink_taskmanager_Status_Flink_Memory_Managed_Total{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 4.294967296E9
# HELP flink_taskmanager_Status_JVM_CPU_Load Load (scope: taskmanager_Status_JVM_CPU)
# TYPE flink_taskmanager_Status_JVM_CPU_Load gauge
flink_taskmanager_Status_JVM_CPU_Load{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.002796347271376764
# HELP flink_taskmanager_Status_JVM_Memory_Mapped_Count Count (scope: taskmanager_Status_JVM_Memory_Mapped)
# TYPE flink_taskmanager_Status_JVM_Memory_Mapped_Count gauge
flink_taskmanager_Status_JVM_Memory_Mapped_Count{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
# HELP flink_taskmanager_Status_JVM_Memory_Heap_Committed Committed (scope: taskmanager_Status_JVM_Memory_Heap)
# TYPE flink_taskmanager_Status_JVM_Memory_Heap_Committed gauge
flink_taskmanager_Status_JVM_Memory_Heap_Committed{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 4.429185024E9
# HELP flink_taskmanager_Status_Network_AvailableMemorySegments AvailableMemorySegments (scope: taskmanager_Status_Network)
# TYPE flink_taskmanager_Status_Network_AvailableMemorySegments gauge
flink_taskmanager_Status_Network_AvailableMemorySegments{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 30037.0
# HELP flink_taskmanager_Status_JVM_Memory_Mapped_TotalCapacity TotalCapacity (scope: taskmanager_Status_JVM_Memory_Mapped)
# TYPE flink_taskmanager_Status_JVM_Memory_Mapped_TotalCapacity gauge
flink_taskmanager_Status_JVM_Memory_Mapped_TotalCapacity{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
# HELP flink_taskmanager_Status_JVM_GarbageCollector_G1_Old_Generation_Count Count (scope: taskmanager_Status_JVM_GarbageCollector_G1_Old_Generation)
# TYPE flink_taskmanager_Status_JVM_GarbageCollector_G1_Old_Generation_Count gauge
flink_taskmanager_Status_JVM_GarbageCollector_G1_Old_Generation_Count{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0

注意: 配置文件中没有配置显式过滤/排除。
有人可以帮助我们如何获取 taskmanager.job.* 指标(包括自定义指标)吗?

configuration apache-flink prometheus metrics
3个回答
0
投票

您能否分享更多您使用过的自定义指标代码,以便我们可以在上下文中更多地了解它?到目前为止您分享的内容显然不正确。或者问题可能与指标名称有关 - 您如何在 Prometheus 中查找它?

您将在 https://docs.immerok.cloud/docs/how-to-guides/development/measuring-latency/中找到自定义指标的工作示例。

注:我为 Immerok 工作。

更新:

听起来问题可能出在普罗米修斯方面。

需要检查几件事:

  1. 我假设你的配置中有一行写着
    metrics.reporters: prom
    ,但上面没有分享它。
  2. 确保您尚未将 Flink 配置为排除某些指标发送到 Prometheus。 (如果您上面分享的配置是完整的,那么这不是问题。)
  3. 检查 Prometheus 配置以查看它正在抓取哪些指标(以及抓取频率)。

https://flink.apache.org/features/2019/03/11/prometheus-monitoring.html有点过时,但仍然相关。

注意:虽然这不是问题,但通过类配置指标报告器(如

metrics.reporter.prom.class
)已经过时很长一段时间,并在 Flink 1.16 中被弃用。这将在 1.17 中被删除。该行可以更新为

metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory

0
投票

问题出在 flink 版本上。我使用的是 1.15.0,它在指标上报告了错误 https://lists.apache.org/thread/6bd9vmcroh7576d7h1kdcd8czf0b4l73

基本上,当作业运行时,与 taskmanager.job.* 相关的任务管理器指标就会消失。将flink升级到1.15.2后,开始正常工作。所有指标以及自定义指标都已正确导出。


0
投票

(用flink1.12.0测试)这是我如何以非常简单的方式解决它的,你最好尝试一下节省时间,我在这个奇怪的问题上浪费了很多时间,不知道为什么官方文档没有说清楚, 只需从此更改您的 flink 配置:

metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter

对此:

metrics.reporters: prom
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.port: 9250-9260

神奇的事情发生在端口范围上,如果您不指定端口范围,例如使用默认的 9249 或者您只指定了一个端口号,您只能看到 flink_jobmanager_xxx,但 flink_taskmanager_xxx 和您自定义的指标将不会显示,让你的笔记更清晰:

  1. 当您按照我的建议更新配置时,您将看到至少有两个端口正在监听,例如 localhost:9250 localhost:9251,并且从 localhost:9250 您将找到来自 localhost 的与 flink_jobmanager_xxx 相关的所有指标:第9251章 你会找到flink_taskmanager_xxx和你自定义的指标

  2. 如果你仍然没有看到你的自定义指标,那么你去下游的Prometheus服务器或grafana面板尝试一下,我不知道为什么,但我发现一旦Prometheus抓取了指标,自定义指标就会消失本地主机:9251 但出现在普罗米修斯

© www.soinside.com 2019 - 2024. All rights reserved.