我有一个用 scala 编写的 flink 作业,我正在创建一个自定义指标来计算流中的事件数量。该作业部署在 kubernetes 上,我在 prometheus 中看到作业管理器和任务管理器的系统指标。然而,尽管我们在 Flink UI 中看到了自定义指标,但我们在 Prometheus 中看不到自定义指标。以下是自定义指标代码:
val sampleProcessFunction = new ProcessFunction[String, String] {
@transient private var counter: Counter = _
override def open(parameters: Configuration): Unit =
counter = getRuntimeContext.getMetricGroup.addGroup("abc").counter("streamcounter")
override def processElement(
value: String,
ctx: ProcessFunction[String, String]#Context,
out: Collector[String]): Unit = {
val result = value.parseJson.toString
counter.inc()
out.collect(result)
}
}
flink-config.yaml 有以下与 prometheus 相关的条目:
taskmanager.network.detailed-metrics: true
metrics.reporter.prom.class:org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.port: 8080
不仅是自定义指标,任何遵循路径 taskmanager.job.* 的任务管理器指标都不会在指标端点中公开。当我进入任务管理器 Pod 并像这样对指标端点进行卷曲时:
kubectl exec -it flink-taskmanager-app-7448cdb787-9c48j -- /bin/bash
curl http://localhost:8080/metrics
我只获取与任务管理器相关的状态指标:
# HELP flink_taskmanager_Status_JVM_Memory_Mapped_MemoryUsed MemoryUsed (scope: taskmanager_Status_JVM_Memory_Mapped)
# TYPE flink_taskmanager_Status_JVM_Memory_Mapped_MemoryUsed gauge
flink_taskmanager_Status_JVM_Memory_Mapped_MemoryUsed{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
# HELP flink_taskmanager_Status_Flink_Memory_Managed_Used Used (scope: taskmanager_Status_Flink_Memory_Managed)
# TYPE flink_taskmanager_Status_Flink_Memory_Managed_Used gauge
flink_taskmanager_Status_Flink_Memory_Managed_Used{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
# HELP flink_taskmanager_Status_Shuffle_Netty_UsedMemorySegments UsedMemorySegments (scope: taskmanager_Status_Shuffle_Netty)
# TYPE flink_taskmanager_Status_Shuffle_Netty_UsedMemorySegments gauge
flink_taskmanager_Status_Shuffle_Netty_UsedMemorySegments{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
# HELP flink_taskmanager_Status_Network_TotalMemorySegments TotalMemorySegments (scope: taskmanager_Status_Network)
# TYPE flink_taskmanager_Status_Network_TotalMemorySegments gauge
flink_taskmanager_Status_Network_TotalMemorySegments{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 30037.0
# HELP flink_taskmanager_Status_Shuffle_Netty_AvailableMemory AvailableMemory (scope: taskmanager_Status_Shuffle_Netty)
# TYPE flink_taskmanager_Status_Shuffle_Netty_AvailableMemory gauge
flink_taskmanager_Status_Shuffle_Netty_AvailableMemory{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 9.84252416E8
# HELP flink_taskmanager_Status_JVM_ClassLoader_ClassesLoaded ClassesLoaded (scope: taskmanager_Status_JVM_ClassLoader)
# TYPE flink_taskmanager_Status_JVM_ClassLoader_ClassesLoaded gauge
flink_taskmanager_Status_JVM_ClassLoader_ClassesLoaded{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 11075.0
# HELP flink_taskmanager_Status_JVM_Memory_Metaspace_Max Max (scope: taskmanager_Status_JVM_Memory_Metaspace)
# TYPE flink_taskmanager_Status_JVM_Memory_Metaspace_Max gauge
flink_taskmanager_Status_JVM_Memory_Metaspace_Max{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 2.68435456E8
# HELP flink_taskmanager_Status_Shuffle_Netty_RequestedMemoryUsage RequestedMemoryUsage (scope: taskmanager_Status_Shuffle_Netty)
# TYPE flink_taskmanager_Status_Shuffle_Netty_RequestedMemoryUsage gauge
flink_taskmanager_Status_Shuffle_Netty_RequestedMemoryUsage{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
# HELP flink_taskmanager_Status_Shuffle_Netty_AvailableMemorySegments AvailableMemorySegments (scope: taskmanager_Status_Shuffle_Netty)
# TYPE flink_taskmanager_Status_Shuffle_Netty_AvailableMemorySegments gauge
flink_taskmanager_Status_Shuffle_Netty_AvailableMemorySegments{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 30037.0
# HELP flink_taskmanager_Status_JVM_Memory_Metaspace_Used Used (scope: taskmanager_Status_JVM_Memory_Metaspace)
# TYPE flink_taskmanager_Status_JVM_Memory_Metaspace_Used gauge
flink_taskmanager_Status_JVM_Memory_Metaspace_Used{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 6.5252976E7
# HELP flink_taskmanager_Status_JVM_Memory_NonHeap_Max Max (scope: taskmanager_Status_JVM_Memory_NonHeap)
# TYPE flink_taskmanager_Status_JVM_Memory_NonHeap_Max gauge
flink_taskmanager_Status_JVM_Memory_NonHeap_Max{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 7.80140544E8
# HELP flink_taskmanager_Status_JVM_Memory_Direct_Count Count (scope: taskmanager_Status_JVM_Memory_Direct)
# TYPE flink_taskmanager_Status_JVM_Memory_Direct_Count gauge
flink_taskmanager_Status_JVM_Memory_Direct_Count{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 30065.0
# HELP flink_taskmanager_Status_JVM_Memory_Direct_TotalCapacity TotalCapacity (scope: taskmanager_Status_JVM_Memory_Direct)
# TYPE flink_taskmanager_Status_JVM_Memory_Direct_TotalCapacity gauge
flink_taskmanager_Status_JVM_Memory_Direct_TotalCapacity{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 9.85225216E8
# HELP flink_taskmanager_Status_JVM_GarbageCollector_G1_Old_Generation_Time Time (scope: taskmanager_Status_JVM_GarbageCollector_G1_Old_Generation)
# TYPE flink_taskmanager_Status_JVM_GarbageCollector_G1_Old_Generation_Time gauge
flink_taskmanager_Status_JVM_GarbageCollector_G1_Old_Generation_Time{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
# HELP flink_taskmanager_Status_JVM_Threads_Count Count (scope: taskmanager_Status_JVM_Threads)
# TYPE flink_taskmanager_Status_JVM_Threads_Count gauge
flink_taskmanager_Status_JVM_Threads_Count{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 51.0
# HELP flink_taskmanager_Status_Shuffle_Netty_TotalMemory TotalMemory (scope: taskmanager_Status_Shuffle_Netty)
# TYPE flink_taskmanager_Status_Shuffle_Netty_TotalMemory gauge
flink_taskmanager_Status_Shuffle_Netty_TotalMemory{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 9.84252416E8
# HELP flink_taskmanager_Status_JVM_GarbageCollector_G1_Young_Generation_Time Time (scope: taskmanager_Status_JVM_GarbageCollector_G1_Young_Generation)
# TYPE flink_taskmanager_Status_JVM_GarbageCollector_G1_Young_Generation_Time gauge
flink_taskmanager_Status_JVM_GarbageCollector_G1_Young_Generation_Time{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 55.0
# HELP flink_taskmanager_Status_JVM_ClassLoader_ClassesUnloaded ClassesUnloaded (scope: taskmanager_Status_JVM_ClassLoader)
# TYPE flink_taskmanager_Status_JVM_ClassLoader_ClassesUnloaded gauge
flink_taskmanager_Status_JVM_ClassLoader_ClassesUnloaded{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
# HELP flink_taskmanager_Status_JVM_Memory_Heap_Used Used (scope: taskmanager_Status_JVM_Memory_Heap)
# TYPE flink_taskmanager_Status_JVM_Memory_Heap_Used gauge
flink_taskmanager_Status_JVM_Memory_Heap_Used{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 1.56297264E8
# HELP flink_taskmanager_Status_JVM_CPU_Time Time (scope: taskmanager_Status_JVM_CPU)
# TYPE flink_taskmanager_Status_JVM_CPU_Time gauge
flink_taskmanager_Status_JVM_CPU_Time{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 4.001E10
# HELP flink_taskmanager_Status_JVM_Memory_Direct_MemoryUsed MemoryUsed (scope: taskmanager_Status_JVM_Memory_Direct)
# TYPE flink_taskmanager_Status_JVM_Memory_Direct_MemoryUsed gauge
flink_taskmanager_Status_JVM_Memory_Direct_MemoryUsed{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 9.85225217E8
# HELP flink_taskmanager_Status_Shuffle_Netty_UsedMemory UsedMemory (scope: taskmanager_Status_Shuffle_Netty)
# TYPE flink_taskmanager_Status_Shuffle_Netty_UsedMemory gauge
flink_taskmanager_Status_Shuffle_Netty_UsedMemory{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
# HELP flink_taskmanager_Status_JVM_GarbageCollector_G1_Young_Generation_Count Count (scope: taskmanager_Status_JVM_GarbageCollector_G1_Young_Generation)
# TYPE flink_taskmanager_Status_JVM_GarbageCollector_G1_Young_Generation_Count gauge
flink_taskmanager_Status_JVM_GarbageCollector_G1_Young_Generation_Count{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 3.0
# HELP flink_taskmanager_Status_JVM_Memory_Metaspace_Committed Committed (scope: taskmanager_Status_JVM_Memory_Metaspace)
# TYPE flink_taskmanager_Status_JVM_Memory_Metaspace_Committed gauge
flink_taskmanager_Status_JVM_Memory_Metaspace_Committed{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 6.7375104E7
# HELP flink_taskmanager_Status_JVM_Memory_Heap_Max Max (scope: taskmanager_Status_JVM_Memory_Heap)
# TYPE flink_taskmanager_Status_JVM_Memory_Heap_Max gauge
flink_taskmanager_Status_JVM_Memory_Heap_Max{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 4.429185024E9
# HELP flink_taskmanager_Status_JVM_Memory_NonHeap_Committed Committed (scope: taskmanager_Status_JVM_Memory_NonHeap)
# TYPE flink_taskmanager_Status_JVM_Memory_NonHeap_Committed gauge
flink_taskmanager_Status_JVM_Memory_NonHeap_Committed{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 9.8787328E7
# HELP flink_taskmanager_Status_JVM_Memory_NonHeap_Used Used (scope: taskmanager_Status_JVM_Memory_NonHeap)
# TYPE flink_taskmanager_Status_JVM_Memory_NonHeap_Used gauge
flink_taskmanager_Status_JVM_Memory_NonHeap_Used{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 9.4597576E7
# HELP flink_taskmanager_Status_Shuffle_Netty_TotalMemorySegments TotalMemorySegments (scope: taskmanager_Status_Shuffle_Netty)
# TYPE flink_taskmanager_Status_Shuffle_Netty_TotalMemorySegments gauge
flink_taskmanager_Status_Shuffle_Netty_TotalMemorySegments{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 30037.0
# HELP flink_taskmanager_Status_Flink_Memory_Managed_Total Total (scope: taskmanager_Status_Flink_Memory_Managed)
# TYPE flink_taskmanager_Status_Flink_Memory_Managed_Total gauge
flink_taskmanager_Status_Flink_Memory_Managed_Total{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 4.294967296E9
# HELP flink_taskmanager_Status_JVM_CPU_Load Load (scope: taskmanager_Status_JVM_CPU)
# TYPE flink_taskmanager_Status_JVM_CPU_Load gauge
flink_taskmanager_Status_JVM_CPU_Load{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.002796347271376764
# HELP flink_taskmanager_Status_JVM_Memory_Mapped_Count Count (scope: taskmanager_Status_JVM_Memory_Mapped)
# TYPE flink_taskmanager_Status_JVM_Memory_Mapped_Count gauge
flink_taskmanager_Status_JVM_Memory_Mapped_Count{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
# HELP flink_taskmanager_Status_JVM_Memory_Heap_Committed Committed (scope: taskmanager_Status_JVM_Memory_Heap)
# TYPE flink_taskmanager_Status_JVM_Memory_Heap_Committed gauge
flink_taskmanager_Status_JVM_Memory_Heap_Committed{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 4.429185024E9
# HELP flink_taskmanager_Status_Network_AvailableMemorySegments AvailableMemorySegments (scope: taskmanager_Status_Network)
# TYPE flink_taskmanager_Status_Network_AvailableMemorySegments gauge
flink_taskmanager_Status_Network_AvailableMemorySegments{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 30037.0
# HELP flink_taskmanager_Status_JVM_Memory_Mapped_TotalCapacity TotalCapacity (scope: taskmanager_Status_JVM_Memory_Mapped)
# TYPE flink_taskmanager_Status_JVM_Memory_Mapped_TotalCapacity gauge
flink_taskmanager_Status_JVM_Memory_Mapped_TotalCapacity{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
# HELP flink_taskmanager_Status_JVM_GarbageCollector_G1_Old_Generation_Count Count (scope: taskmanager_Status_JVM_GarbageCollector_G1_Old_Generation)
# TYPE flink_taskmanager_Status_JVM_GarbageCollector_G1_Old_Generation_Count gauge
flink_taskmanager_Status_JVM_GarbageCollector_G1_Old_Generation_Count{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
注意: 配置文件中没有配置显式过滤/排除。
有人可以帮助我们如何获取 taskmanager.job.* 指标(包括自定义指标)吗?
您能否分享更多您使用过的自定义指标代码,以便我们可以在上下文中更多地了解它?到目前为止您分享的内容显然不正确。或者问题可能与指标名称有关 - 您如何在 Prometheus 中查找它?
您将在 https://docs.immerok.cloud/docs/how-to-guides/development/measuring-latency/中找到自定义指标的工作示例。
注:我为 Immerok 工作。
更新:
听起来问题可能出在普罗米修斯方面。
需要检查几件事:
metrics.reporters: prom
,但上面没有分享它。https://flink.apache.org/features/2019/03/11/prometheus-monitoring.html有点过时,但仍然相关。
注意:虽然这不是问题,但通过类配置指标报告器(如
metrics.reporter.prom.class
)已经过时很长一段时间,并在 Flink 1.16 中被弃用。这将在 1.17 中被删除。该行可以更新为
metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory
问题出在 flink 版本上。我使用的是 1.15.0,它在指标上报告了错误 https://lists.apache.org/thread/6bd9vmcroh7576d7h1kdcd8czf0b4l73
基本上,当作业运行时,与 taskmanager.job.* 相关的任务管理器指标就会消失。将flink升级到1.15.2后,开始正常工作。所有指标以及自定义指标都已正确导出。
(用flink1.12.0测试)这是我如何以非常简单的方式解决它的,你最好尝试一下节省时间,我在这个奇怪的问题上浪费了很多时间,不知道为什么官方文档没有说清楚, 只需从此更改您的 flink 配置:
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
对此:
metrics.reporters: prom
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.port: 9250-9260
神奇的事情发生在端口范围上,如果您不指定端口范围,例如使用默认的 9249 或者您只指定了一个端口号,您只能看到 flink_jobmanager_xxx,但 flink_taskmanager_xxx 和您自定义的指标将不会显示,让你的笔记更清晰:
当您按照我的建议更新配置时,您将看到至少有两个端口正在监听,例如 localhost:9250 localhost:9251,并且从 localhost:9250 您将找到来自 localhost 的与 flink_jobmanager_xxx 相关的所有指标:第9251章 你会找到flink_taskmanager_xxx和你自定义的指标
如果你仍然没有看到你的自定义指标,那么你去下游的Prometheus服务器或grafana面板尝试一下,我不知道为什么,但我发现一旦Prometheus抓取了指标,自定义指标就会消失本地主机:9251 但出现在普罗米修斯