我正在尝试在 kubernetes 管理的集群中运行 scala/spark 应用程序。
我构建了 scala/spark 应用程序的 jar 文件:scala-spark-1.0-jar-with-dependency.jar
我从 apache/spark 构建了自己的 docker 镜像:最新添加了我的 jar 文件:
`Dockerfile:
来自 apache/spark:最新
复制 scala-spark-1.0-jar-with-dependency.jar /opt/spark/work-dir/
构建:
docker 构建。 -t xxxx.cloud/spark/testavg-spark `
我将 docker 映像推送到我的 docker 注册表:xxxx.cloud/spark/testavg-spark
docker push xxxx.cloud/spark/testavg-spark
在本地运行容器以验证它是否实际包含 jar 文件:
`docker run -it xxxx.cloud/spark/testavg-spark:最新的 bash Spark@c6ae887a6c93:/opt/spark/work-dir$ ls -lrt
总计247160 -rw-rw-r-- 1根根253087621 3月7日08:37 scala-spark-1.0-jar-with-dependency.jar `
使用spark Submit命令在k8s集群上执行POD和应用程序:
spark-submit --class TestAvg --master k8s://https://yyyyy:6443 --deploy-mode cluster --name SparkTestAvg --conf spark.executor.instances=3 --conf spark.kubernetes.container.image.pullSecrets=pull-secret --conf spark.kubernetes.container.image=xxxx.cloud/spark/testavg-spark:latest --conf spark.kubernetes.authenticate=${AUTH_KEY} local:///opt/spark/work-dir/scala-spark-1.0-jar-with-dependencies.jar
执行开始并结束时出现错误:
`24/03/07 10:05:08 WARN Utils: Your hostname, xxxxxx resolves to a loopback address: 127.0.1.1; using 192.168.99.159 instead (on interface enp0s31f6)
24/03/07 10:05:08 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
24/03/07 10:05:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
24/03/07 10:05:08 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
24/03/07 10:05:09 INFO KerberosConfDriverFeatureStep: You have not specified a krb5.conf file locally or via a ConfigMap. Make sure that you have the krb5.conf locally on the driver image.
24/03/07 10:05:10 INFO LoggingPodStatusWatcherImpl: State changed, new state:
pod name: sparktestavg-41462f8e1828c47c-driver
namespace: default
....
24/03/07 10:05:28 INFO LoggingPodStatusWatcherImpl: State changed, new state:
pod name: sparktestavg-41462f8e1828c47c-driver
namespace: default
labels: spark-app-name -> sparktestavg, spark-app-selector -> spark-3e394a89a4b64a41ab92267b25b00d29, spark-role -> driver, spark-version -> 3.5.0
pod uid: 493eaaf9-c319-4eb9-bdf2-c04a30fd5498
creation time: 2024-03-07T09:05:09Z
service account name: default
volumes: spark-local-dir-1, spark-conf-volume-driver, kube-api-access-2xq88
node name: xxxxxxxx-f49be65dc8d94931852164
start time: 2024-03-07T09:05:09Z
phase: Running
container status:
container name: spark-kubernetes-driver
container image: xxxx.cloud/spark/testavg-spark:latest
container state: terminated
container started at: 2024-03-07T09:05:23Z
container finished at: 2024-03-07T09:05:26Z
exit code: 1
termination reason: Error
`
错误是:
`kubectl logs sparktestavg-41462f8e1828c47c-driver
Files local:///opt/spark/work-dir/scala-spark-1.0-jar-with-dependencies.jar from /opt/spark/work-dir/scala-spark-1.0-jar-with-dependencies.jar to /opt/spark/work-dir/scala-spark-1.0-jar-with-dependencies.jar
Exception in thread "main" java.nio.file.NoSuchFileException: /opt/spark/work-dir/scala-spark-1.0-jar-with-dependencies.jar
at java.base/sun.nio.fs.UnixException.translateToIOException(Unknown Source)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(Unknown Source)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(Unknown Source)
at java.base/sun.nio.fs.UnixCopyFile.copy(Unknown Source)
at java.base/sun.nio.fs.UnixFileSystemProvider.copy(Unknown Source)
at java.base/java.nio.file.Files.copy(Unknown Source)
`
为什么我会收到此错误?文件 /opt/spark/work-dir/scala-spark-1.0-jar-with-dependency.jar 包含在我的 docker 映像中
提前致谢。
我也遇到同样的问题。我不知道为什么会发生这种情况,但我设法找到了一个解决方法,即使用不同的工作目录。
例如,如果我的工作目录在work-dir/中,我会将jar放入jar-dir/并在work-dir中运行spark-submit xxxx local:///jar-dir/xxx.jar。 它将把 xxx.jar 移动到工作目录...
如果有人知道为什么会发生这种情况,请也告诉我