在进行一些转换并将其加载到数据库后,我正在 Spark 中从 kafka 读取数据。
当我通过 Spark-Submit 开始工作时,我可以看到它启动了 Spark Submit 中提到的容器数量,但我可以从资源管理器 ui 中看到,运行驱动程序进程的容器仅具有来自我的应用程序和休息状态的日志执行程序日志中我只看到 GC 相关日志?
我的疑问是我的集群是否得到充分利用?或者只有一个容器在做实际工作,而休息只是理想的状态。
我正在阅读大约 50 个主题,每个主题都已分区(至少 4 个分区)
我正在使用 EMR 6.5,它附带 Spark 3.1.2 和 scala 2.12.14
我的火花提交:
spark-submit --name read-from-kafka --deploy-mode cluster --master yarn --conf spark.eventLog.enabled=false --conf spark.sql.caseSensitive=true --conf spark.sql.shuffle.partitions=50 --conf spark.driver.memory=5300M --class com.XXX.XXX.reports.unifiedLoader --jars file:////home/hadoop/lib/* --executor-memory 5600M --conf "spark.alert.duration=4" --conf spark.dynamicAllocation.enabled=true --conf spark.dynamicAllocation.initialExecutors=1 --conf spark.dynamicAllocation.minExecutors=1 --conf spark.dynamicAllocation.maxExecutors=6 --executor-cores 2 --files /home/hadoop/reports/log4j.properties,/home/hadoop/reports/application_commonJob_cdc.conf,/home/hadoop/reports/sendalert.sh --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=file:log4j.properties -XX:+UseG1GC -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark -XX:InitiatingHeapOccupancyPercent=35 -XX:OnOutOfMemoryError='kill -9 %p'" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:log4j.properties" radfromkafka_2.12-1.0.jar application_commonJob_cdc.conf
集群配置: 3 个具有 32GB 和 8vcpu 的工作人员 1 个具有 32 GB 和 8 个 vcpu 的主设备
下面是来自执行器容器的日志:
2024-04-30T14:00:28.018+0000: [GC (Allocation Failure) [PSYoungGen: 126464K->12782K(147456K)] 126464K->12798K(484864K), 0.0088915 secs] [Times: user=0.02 sys=0.01, real=0.01 secs] 2024-04-30T14:00:28.258+0000: [GC (Metadata GC Threshold) [PSYoungGen: 64103K->8546K(147456K)] 64119K->8570K(484864K), 0.0057556 secs] [Times: user=0.03 sys=0.00, real=0.01 secs] 2024-04-30T14:00:28.264+0000: [Full GC (Metadata GC Threshold) [PSYoungGen: 8546K->0K(147456K)] [ParOldGen: 24K->8255K(202752K)] 8570K->8255K(350208K), [Metaspace: 20274K->20274K(1067008K)], 0.0303067 secs] [Times: user=0.09 sys=0.00, real=0.03 secs] 2024-04-30T14:00:29.062+0000: [GC (Allocation Failure) [PSYoungGen: 126464K->8167K(147456K)] 134719K->16431K(350208K), 0.0034351 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 2024-04-30T14:00:29.320+0000: [GC (Metadata GC Threshold) [PSYoungGen: 98819K->7312K(198656K)] 107083K->15583K(401408K), 0.0050790 secs] [Times: user=0.03 sys=0.00, real=0.01 secs] 2024-04-30T14:00:29.325+0000: [Full GC (Metadata GC Threshold) [PSYoungGen: 7312K->0K(198656K)] [ParOldGen: 8271K->13221K(326656K)] 15583K->13221K(525312K), [Metaspace: 33825K->33822K(1079296K)], 0.0266719 secs] [Times: user=0.06 sys=0.00, real=0.03 secs] 2024-04-30T14:00:29.852+0000: [GC (Allocation Failure) [PSYoungGen: 177664K->7411K(198656K)] 190885K->20641K(525312K), 0.0062175 secs] [Times: user=0.02 sys=0.01, real=0.00 secs] 2024-04-30T14:00:38.112+0000: [GC (Allocation Failure) [PSYoungGen: 185075K->9928K(244736K)] 198305K->23166K(571392K), 0.0067299 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] 2024-04-30T14:00:38.970+0000: [GC (Metadata GC Threshold) [PSYoungGen: 208351K->13300K(255488K)] 221589K->28096K(582144K), 0.0106491 secs] [Times: user=0.05 sys=0.01, real=0.01 secs] 2024-04-30T14:00:38.981+0000: [Full GC (Metadata GC Threshold) [PSYoungGen: 13300K->0K(255488K)] [ParOldGen: 14795K->23697K(455168K)] 28096K->23697K(710656K), [Metaspace: 53584K->52539K(1099776K)], 0.0873161 secs] [Times: user=0.37 sys=0.00, real=0.09 secs] 2024-04-30T14:00:40.097+0000: [GC (Allocation Failure) [PSYoungGen: 242176K->8165K(329216K)] 265873K->31871K(784384K), 0.0062505 secs] [Times: user=0.04 sys=0.00, real=0.01 secs] 2024-04-30T14:00:42.873+0000: [GC (Allocation Failure) [PSYoungGen: 321091K->15345K(328704K)] 344796K->172435K(783872K), 0.0319163 secs] [Times: user=0.13 sys=0.03, real=0.03 secs]
您需要启用spark-ui才能查看集群的利用率