停止前一个会话后创建一个新的本地 Spark 会话:Spark 告诉我 Derby 并未停止

问题描述 投票:0回答:1

在我的单元测试中,我必须停止本地 Spark 会话并创建另一个会话(使用从上一个会话保存的元存储中的数据)。

但是当创建另一个 Spark 会话时,它无法使用使用第一个 Spark 会话创建的本地 Metastore_db。

from pyspark.sql import SparkSession

if __name__ == "__main__":
    spark_builder = SparkSession.builder.enableHiveSupport().master("local").appName("mega_app")
    spark_session = spark_builder.getOrCreate()
    spark_session.sql("SHOW DATABASES").show()  # crashes here!

    # Here I get normal output (even when starting this script again and again):
    #
    # +------------+
    # |databaseName|
    # +------------+
    # |     default|
    # +------------+

    spark_session.stop()

    # Here I'm trying to create a new Spark session and check it:
    spark_builder = SparkSession.builder.enableHiveSupport().master("local").appName("mega_app")
    spark_session = spark_builder.getOrCreate()
    spark_session.sql("SHOW DATABASES").show()

    # And here I get the error:
    #
    # java.sql.SQLException: Unable to open a test connection to the given database.
    # JDBC url = jdbc:derby:;databaseName=metastore_db;create=true, username = APP.
    # Terminating connection pool (set lazyInit to true if you expect to start your database after your app)
    #
    # ERROR XJ040: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@24d1c991
    # Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database /home/felix/Projects/baka_etl/baka_etl/metastore_db.

如何修复?

附注我用

pyspark==2.4.8
pypspark==3.3.4
进行了测试,得到了相同的结果。

apache-spark pyspark hive derby
1个回答
0
投票

你可以尝试pyspark 3.5.3,它可以工作。

我又添加了两行代码来创建表格并显示表格。

# cat test.py
from pyspark.sql import SparkSession

if __name__ == "__main__":
    spark_builder = SparkSession.builder.enableHiveSupport().master("local").appName("mega_app")
    spark_session = spark_builder.getOrCreate()
    spark_session.sql("SHOW DATABASES").show()  # crashes here!
    spark_session.sql("CREATE TABLE IF NOT EXISTS T1 (C1 INT)")

    spark_session.stop()

    # Here I'm trying to create a new Spark session and check it:
    spark_builder = SparkSession.builder.enableHiveSupport().master("local").appName("mega_app")
    spark_session = spark_builder.getOrCreate()
    spark_session.sql("SHOW DATABASES").show()
    spark_session.sql("SHOW TABLES").show()

以下是测试结果:

# pip list|grep pyspark
pyspark                      3.5.3

# python test.py
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/11/28 11:30:00 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
24/11/28 11:30:03 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
24/11/28 11:30:03 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
24/11/28 11:30:04 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0
24/11/28 11:30:04 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore [email protected]
+---------+
|namespace|
+---------+
|  default|
+---------+

24/11/28 11:30:06 WARN ResolveSessionCatalog: A Hive serde table will be created as there is no table provider specified. You can set spark.sql.legacy.createHiveTableByDefault to false so that native data source table will be created instead.
+---------+
|namespace|
+---------+
|  default|
+---------+

24/11/28 11:30:06 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
+---------+---------+-----------+
|namespace|tableName|isTemporary|
+---------+---------+-----------+
|  default|       t1|      false|
+---------+---------+-----------+

© www.soinside.com 2019 - 2024. All rights reserved.