指出的错误代码如下
String strJsonContent = SessionContext
.getSparkSession()
.read()
.json(filePath)
.toJSON()
.first();
我正在使用 Maven 来构建没有依赖项的包。 pom.xml 属性如下
<scala.version>2.12.1</scala.version>
<scala.binary.version>2.12</scala.binary.version>
<jsonpath.version>2.4.0</jsonpath.version>
<hadoop-core.version>1.2.1</hadoop-core.version>
<hadoop.version>hadoop2-1.9.17</hadoop.version>
<google-cloud.version>1.135.2</google-cloud.version>
<spark-bigquery_2.12.version>0.19.0</spark-bigquery_2.12.version>
基本上这个 Spark 连接器版本应该只在本地模式下工作,因为在 Google DataProc 中提交作业时,将给出一个参数将其设置为 gs://spark-lib/bigquery/spark-bigquery-latest_2.12。罐子.
此代码适用于VM 2.0-debian10,但是当将版本升级到2.2-debian12时,会出现此错误。
Exception in thread "main" java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider com.google.cloud.spark.bigquery.BigQueryRelationProvider could not be instantiated
error code direct to above line I provided.
and casued by
Caused by: java.lang.IllegalStateException: This connector was made for Scala null, it was not meant to run on Scala 2.12
GCE 2.1+ 映像上的 Dataproc 预安装 Spark BigQquery 连接器:https://cloud.google.com/dataproc/docs/concepts/versioning/dataproc-release-2.2
要解决此问题,您需要从 Spark 应用程序中的依赖项中删除(或标记为
provided
)Spark BigQuery 连接器 pom.xml
。