AWS EKS 上的 Spark java.lang.ClassNotFoundException：在集群模式下运行时找不到类 org.apache.hadoop.fs.s3a.S3AFileSystem

Question

我正在尝试在 EKS 集群上运行 Spark 作业。当我在集群模式下运行它时，我收到以下信息

 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2688)
        at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3431)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)
        at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
        at org.apache.spark.util.DependencyUtils$.resolveGlobPath(DependencyUtils.scala:317)
        at org.apache.spark.util.DependencyUtils$.$anonfun$resolveGlobPaths$2(DependencyUtils.scala:273)
        at org.apache.spark.util.DependencyUtils$.$anonfun$resolveGlobPaths$2$adapted(DependencyUtils.scala:271)
        at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
        at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
        at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
        at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
        at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)
        at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
        at org.apache.spark.util.DependencyUtils$.resolveGlobPaths(DependencyUtils.scala:271)
        at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$5(SparkSubmit.scala:393)
        at scala.Option.map(Option.scala:230)
        at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:393)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:964)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1120)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1129)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2592)
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2686)
        ... 27 more

我来自 EC2 实例。火花提交如下所示：

spark-3.5.3-bin-hadoop3/bin/spark-submit \
--master k8s://https://aws.cluster:443 \
--deploy-mode cluster \
--name test1 \
--verbose \
--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
--conf spark.hadoop.fs.s3a.endpoint=s3.region.amazonaws.com \
--conf spark.hadoop.fs.s3a.access.key=access_key \
--conf spark.hadoop.fs.s3a.secret.key=secret_key \
--conf spark.kubernetes.container.image=ecr/spark-py:3.5.3 \
--conf spark.driver.extraClassPath="/opt/spark/jars/hadoop-aws-3.3.4.jar:/opt/spark/jars/aws-java-sdk-bundle-1.12.180.jar" \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--files 's3a://bucket/file1.jpg,s3a://bucket/file2.jpg' \
--py-files s3a://bucket/py-files.zip \
s3a://bucket/spark-application.py

当我提交 Spark 作业时，集群会启动驱动程序，这就是我收到错误的地方。我已确保适当的 jar 位于 $SPARK_CLASSPATH 中，并且 jar 是正确的版本。

当我进入 EKS 节点上正在启动的完全相同的容器并在客户端模式而不是集群模式下运行完全相同的 Spark 作业时，我没有收到错误并且作业成功运行。这是 Spark 提交的示例。

docker run -it ecr/spark-py:3.5.3 /bin/bash
/opt/spark/bin/spark-submit \
--deploy-mode client \
--name test1 \
--verbose \
--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
--conf spark.hadoop.fs.s3a.endpoint=s3.region.amazonaws.com \
--conf spark.hadoop.fs.s3a.access.key=access_key \
--conf spark.hadoop.fs.s3a.secret.key=secret_key \
--conf spark.kubernetes.container.image=ecr/spark-py:3.5.3 \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--files 's3a://bucket/file1.jpg,s3a://bucket/file2.jpg' \
--py-files s3a://bucket/py-files.zip \
s3a://bucket/spark-application.py

这是结果


24/12/16 18:56:54 INFO Utils: Successfully started service 'SparkUI' on port 4040.
24/12/16 18:56:54 INFO SparkContext: Added file s3a://bucket/file1.jpg at s3a://bucket/file1.jpg with timestamp 1734375412862
24/12/16 18:56:54 INFO Utils: Fetching s3a://bucket/file1.jpg to /tmp/spark-83e76e50-dab8-4f2e-9138-d58d1f69e83b/userFiles-c9ca1323-4933-4cec-9066-c4437830d6dc/fetchFileTemp16712750277348807188.tmp
24/12/16 18:56:54 INFO SparkContext: Added file s3a://bucket/file2.jpg at s3a://bucket/file2.jpg with timestamp 1734375412862
24/12/16 18:56:54 INFO Utils: Fetching s3a://bucket/file2.jpg to /tmp/spark-83e76e50-dab8-4f2e-9138-d58d1f69e83b/userFiles-c9ca1323-4933-4cec-9066-c4437830d6dc/fetchFileTemp9918504351183826242.tmp
24/12/16 18:56:55 INFO SparkContext: Added file s3a://bucket/pyfiles.zip at s3a://bucket/pyfiles.zip with timestamp 1734375412862
24/12/16 18:56:55 INFO Utils: Fetching s3a://bucket/pyfiles.zip to /tmp/spark-83e76e50-dab8-4f2e-9138-d58d1f69e83b/userFiles-c9ca1323-4933-4cec-9066-c4437830d6dc/fetchFileTemp2697949075798797977.tmp
24/12/16 18:56:55 INFO Executor: Starting executor ID driver on host 1103cb8139a4
24/12/16 18:56:55 INFO Executor: OS info Linux, 6.8.0-1019-aws, amd64
24/12/16 18:56:55 INFO Executor: Java version 17.0.13
24/12/16 18:56:55 INFO Executor: Starting executor with user classpath (userClassPathFirst = false): ''
24/12/16 18:56:55 INFO Executor: Created or updated repl class loader org.apache.spark.util.MutableURLClassLoader@51434498 for default.
24/12/16 18:56:55 INFO Executor: Fetching s3a://bucket/pyfiles.zip with timestamp 1734375412862
24/12/16 18:56:55 INFO Utils: Fetching s3a://bucket/pyfiles.zip to /tmp/spark-83e76e50-dab8-4f2e-9138-d58d1f69e83b/userFiles-c9ca1323-4933-4cec-9066-c4437830d6dc/fetchFileTemp412463927524377149.tmp
24/12/16 18:56:55 INFO Utils: /tmp/spark-83e76e50-dab8-4f2e-9138-d58d1f69e83b/userFiles-c9ca1323-4933-4cec-9066-c4437830d6dc/fetchFileTemp412463927524377149.tmp has been previously copied to /tmp/spark-83e76e50-dab8-4f2e-9138-d58d1f69e83b/userFiles-c9ca1323-4933-4cec-9066-c4437830d6dc/test-spark_files_asn.zip
24/12/16 18:56:55 INFO Executor: Fetching s3a://bucket/file1.jpg with timestamp 1734375412862
24/12/16 18:56:55 INFO Utils: Fetching s3a://bucket/file1.jpg to /tmp/spark-83e76e50-dab8-4f2e-9138-d58d1f69e83b/userFiles-c9ca1323-4933-4cec-9066-c4437830d6dc/fetchFileTemp4019774831140707035.tmp
24/12/16 18:56:55 INFO Utils: /tmp/spark-83e76e50-dab8-4f2e-9138-d58d1f69e83b/userFiles-c9ca1323-4933-4cec-9066-c4437830d6dc/fetchFileTemp4019774831140707035.tmp has been previously copied to /tmp/spark-83e76e50-dab8-4f2e-9138-d58d1f69e83b/userFiles-c9ca1323-4933-4cec-9066-c4437830d6dc/GeoLite2-ASN.mmdb
24/12/16 18:56:55 INFO Executor: Fetching s3a://bucket/file2.jpg with timestamp 1734375412862
24/12/16 18:56:55 INFO Utils: Fetching s3a://bucket/file2.jpg to /tmp/spark-83e76e50-dab8-4f2e-9138-d58d1f69e83b/userFiles-c9ca1323-4933-4cec-9066-c4437830d6dc/fetchFileTemp11601213107382369682.tmp
24/12/16 18:56:55 INFO Utils: /tmp/spark-83e76e50-dab8-4f2e-9138-d58d1f69e83b/userFiles-c9ca1323-4933-4cec-9066-c4437830d6dc/fetchFileTemp11601213107382369682.tmp has been previously copied to /tmp/spark-83e76e50-dab8-4f2e-9138-d58d1f69e8

我不明白为什么它在容器上以客户端模式工作，但在同一容器映像上以集群模式工作却不起作用。

Answer 1

在集群模式下，驱动程序与客户端运行在不同的计算机上，因此 SparkContext.addJar 无法直接使用客户端本地的文件。要使客户端上的文件可供 SparkContext.addJar 使用，请将它们包含在启动命令中的 --jars 选项中。

来源

AWS EKS 上的 Spark java.lang.ClassNotFoundException：在集群模式下运行时找不到类 org.apache.hadoop.fs.s3a.S3AFileSystem

问题描述投票：0回答：1

1个回答

最新问题

AWS EKS 上的 Spark java.lang.ClassNotFoundException：在集群模式下运行时找不到类 org.apache.hadoop.fs.s3a.S3AFileSystem

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1