我想使用spark dataframe连接到mongodb,并制作镶木地板文件。我在 sbt 文件中配置 mongo-spark-connector v10.2.2,它在本地工作。然而,在生产中,spark 应用程序会报告
java.lang.ClassNotFoundException: Failed to find data source: mongodb. Please find packages at http://spark.apache.org/third-party-projects.html
。 (所有外部库都将构建到 assembly.jar 中)
我检查了 assembly.jar,sbt 确实下载了连接器 jar。有什么我想念的吗?
[添加。信息]
这是 Spark 版本:3.1.3 和 mongo-spark-connector v10.2.2
Spark应用程序使用客户端模式。
我手动下载了 mongo-spark-connector.jar 并使用了像
java -cp etc::/opt/app/lib/mongo-spark-connector_2.12-10.2.2.jar:/opt/app/lib/assembly-1.0.jar
这样的系统属性命令,它可以工作。
这是我的 build.sbt 文件。
name := "app"
version := "1.0"
scalaVersion in ThisBuild := "2.12.10"
lazy val sparkVersion = "3.1.3"
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-core_2.12" % sparkVersion,
"org.apache.spark" % "spark-sql_2.12" % sparkVersion,
"org.apache.spark" % "spark-avro_2.12" % sparkVersion,
"org.apache.parquet" % "parquet-avro" % "1.12.2",
"org.mongodb.spark" %% "mongo-spark-connector" % "10.2.2"
)
assemblyMergeStrategy in assembly := {
case PathList("META-INF", "MANIFEST.MF") => MergeStrategy.discard
case x if x.startsWith("META-INF") && x.endsWith(".SF") => MergeStrategy.discard
case x if x.startsWith("META-INF") && x.endsWith(".RSA") => MergeStrategy.discard
case x if x.startsWith("META-INF") && x.endsWith(".DSA") => MergeStrategy.discard
case x if x.startsWith("META-INF") && x.endsWith(".TXT") => MergeStrategy.discard
case PathList(ps@_*) if ps.last endsWith ".conf" => MergeStrategy.concat
case x => MergeStrategy.first
}
这是启动应用程序的 shell 脚本。
#!/bin/bash
CLASSPATH="";
for jar_file in /opt/app/lib/*.jar; do
if [[ $jar_file != /opt/app/lib/app*.jar ]]; then
CLASSPATH="$CLASSPATH:$jar_file";
fi
done;
for jar_file in /opt/app/lib/app*.jar; do
CLASSPATH="$CLASSPATH:$jar_file";
done;
CLASSPATH="etc:$CLASSPATH";
JVM_OPTS="$JVM_OPTS -Dfile.encoding=UTF-8";
JVM_OPTS="$JVM_OPTS -verbose:gc";
JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"
JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=20"
JVM_OPTS="$JVM_OPTS -XX:+PrintGCDateStamps";
JVM_OPTS="$JVM_OPTS -XX:+PrintHeapAtGC";
JVM_OPTS="$JVM_OPTS -XX:+PrintGCDetails";
JVM_OPTS="$JVM_OPTS -XX:+UseGCLogFileRotation";
JVM_OPTS="$JVM_OPTS -XX:NumberOfGCLogFiles=1";
JVM_OPTS="$JVM_OPTS -XX:GCLogFileSize=1M";
JVM_OPTS="$JVM_OPTS -XX:+AlwaysPreTouch";
JVM_OPTS="$JVM_OPTS -XX:+UseCompressedOops";
JVM_OPTS="$JVM_OPTS -XX:+HeapDumpOnOutOfMemoryError";
MAIN_CLASS="App"
nohup java -classpath $CLASSPATH $JVM_OPTS $MAIN_CLASS "$@" > "$log" 2>&1 < /dev/null &