我正在尝试使用 Apache Iceberg 配置 Apache Spark。 但出现这个错误:
线程“main”java.lang.NoSuchMethodError 中出现异常:
'org.apache.hadoop.hive.metastore.IMetaStoreClient org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(org.apache.hadoop.hive.conf.HiveConf, org.apache.hadoop.hive.metastore.HiveMetaHookLoader, java.util.concurrent.ConcurrentHashMap, java.lang.String, boolean)'
我尝试了从 3.0.0 到 3.1.3 的不同版本的“hive-metastore”。
在 3.1.3 中我得到了:
线程“main”java.lang.UnsupportedOperationException 中出现异常: 不受支持的 Hive Metastore 版本 (3.1.3)。请设定 具有有效版本的spark.sql.hive.metastore.version。
这是我的build.sbt:
ThisBuild / version := "0.1.0-SNAPSHOT"
ThisBuild / scalaVersion := "2.12.15"
lazy val root = (project in file("."))
.settings(
name := "iceberg"
val sparkVersion = "3.1.1"
libraryDependencies ++= Seq(
"com.github.pureconfig" %% "pureconfig" % "0.14.0",
"org.apache.spark" %% "spark-core" % sparkVersion % Compile,
"org.apache.spark" %% "spark-sql" % sparkVersion % Compile,
"org.apache.hadoop" % "hadoop-aws" % sparkVersion % Compile,
"com.github.housepower" % "clickhouse-integration-spark_2.12" % "2.7.1",
"com.github.housepower" % "clickhouse-native-jdbc" % "2.7.1",
"org.apache.iceberg" %% "iceberg-spark-runtime-3.1" % "1.3.0" % Compile,
"org.apache.spark" %% "spark-hive" % sparkVersion % Compile,
"org.apache.hive" % "hive-metastore" % "3.1.0",
"org.apache.iceberg" % "iceberg-hive-metastore" % "1.3.1"
)
dependencyOverrides ++= Seq(
"com.fasterxml.jackson.module" % "jackson-module-scala_2.12" % "2.14.2"
)
我的 scala 代码:
import org.apache.spark.sql.{SaveMode, SparkSession}
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions._
import org.apache.hadoop.fs.{FileSystem, Path}
import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.types._
import org.apache.log4j.{Level, Logger}
import ru.gpmdata.datafactory.dooh.config.AppConfig
object MyJob extends App {
val appConfig = AppConfig()
val s3Config = appConfig.s3
val chConfig = appConfig.clickhouse
val spark = SparkSession.builder()
.appName("MyJob")
.master("local")
// Enabling Iceberg Metastore
.config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
.config("spark.sql.catalog.spark_catalog", "org.apache.iceberg.spark.SparkSessionCatalog")
.config("spark.sql.catalog.spark_catalog.type", "rest")
.config("spark.sql.catalog.spark_catalog.uri", "https://iceberg.foo.org")
.config("spark.sql.catalog.hadoop_prod.hadoop.fs.s3a.endpoint", "http://s3.foo.org:9000")
.config("spark.sql.catalog.hadoop_prod.hadoop.fs.s3a.access.key", appConfig.s3.accessKey)
.config("spark.sql.catalog.hadoop_prod.hadoop.fs.s3a.secret.key", appConfig.s3.secretKey)
.config("spark.sql.catalog.spark_catalog.warehouse.dir", "s3a://foo-iceberg-prod")
.config("spark.sql.hive.metastore.version", "3.1.0")
.config("spark.sql.catalog.local", "org.apache.iceberg.spark.SparkCatalog")
.config("spark.sql.catalog.local.type", "hadoop")
.config("spark.sql.catalog.local.warehouse", "./warehouse")
.config("spark.sql.defaultCatalog", "spark_catalog")
.enableHiveSupport()
.getOrCreate()
val icebergDF = spark.sql("select 1 as n")
icebergDF.show()
spark.stop()
}
这为我解决了完全相同的问题:
"org.apache.hive" % "hive-metastore" % "2.3.7"
和
.config("spark.sql.catalog.iceberg_catalog.io-impl", "org.apache.iceberg.aws.s3.S3FileIO")
.config("spark.hadoop.fs.s3.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")