“Spark-PySpark Redshift JDBC 写入：没有合适的驱动程序/ClassNotFoundException：com.amazon.redshift.jdbc42.Driver 错误”

Question

我正在尝试使用 Redshift JDBC 驱动程序将 DataFrame 从 Spark (PySpark) 写入 Amazon Redshift Serverless 集群。

我不断遇到与驱动程序相关的错误：

•   java.sql.SQLException: No suitable driver

•   java.lang.ClassNotFoundException: com.amazon.redshift.jdbc42.Driver

我尝试过的：

1.  Setup:

•   Spark version: (e.g., Spark 3.3.1)

•   Hadoop AWS packages: --packages org.apache.hadoop:hadoop-aws:3.3.1,com.amazonaws:aws-java-sdk-bundle:1.11.901

•   Redshift JDBC driver: RedshiftJDBC42-2.1.0.30.jar downloaded from Amazon’s official site.

2.  spark-submit command:

spark-提交
--conf Spark.driver.bindAddress=127.0.0.1
--conf Spark.driver.host=127.0.0.1
--驱动程序内存 4g
--packages org.apache.hadoop:hadoop-aws:3.3.1,com.amazonaws:aws-java-sdk-bundle:1.11.901
--jars /path/to/RedshiftJDBC42-2.1.0.30.jar
--driver-class-path /path/to/RedshiftJDBC42-2.1.0.30.jar
my_script.py

我尝试添加 --driver-class-path 以便驱动程序对驱动程序可见。 JAR 文件肯定存在于指定的路径中。

3.  In the Python Code:

jdbc_url =“jdbc:redshift://:5439/dev” （df.write .format("jdbc") .option("url",jdbc_url) .option("dbtable", "public.my_staging_table") .option("用户", os.environ["REDSHIFT_USER"]) .option("密码", os.environ["REDSHIFT_PASSWORD"]) .option("驱动程序", "com.amazon.redshift.jdbc42.Driver") .mode("追加") .保存())

代码运行良好，直到 .save() 步骤，此时我得到 No合适的驱动程序或 Redshift 驱动程序类的 ClassNotFoundException。

我所知道的：

•   The Redshift JDBC driver class should be com.amazon.redshift.jdbc42.Driver.

•   I’ve seen suggestions to use --driver-class-path plus --jars to ensure the driver is on both driver and executor classpaths.

•   If I remove --driver-class-path, I sometimes get ClassNotFoundException. With it, I still get No suitable driver.

•   My AWS credentials and S3 reading works fine (I can read JSON from S3). The problem occurs only at the JDBC write to Redshift step.

问题：

•   Is there another configuration step needed to ensure Spark recognizes the Redshift driver?

•   Do I need to specify any additional spark configs for the JDBC driver?

•   Are there known compatibility issues with this Redshift driver version and Spark/Hadoop versions?

•   Should I consider a different Redshift driver JAR or a different approach (like spark-redshift or redshift-jdbc42-no-awssdk JAR)?

任何有关解决在 Spark 中通过 JDBC 写入 Redshift 时出现的 No合适的驱动程序和 ClassNotFoundException 错误的指导，我们将不胜感激。

Answer 1

在您的 jdbc_url 中，jdbc:redshift://:5439/dev、主机和数据库部分丢失。

JDBC URL 必须采用以下形式：

jdbc:redshift://<host>:<port>/<database>

例如：

jdbc:redshift://mycluster.myclusteruuid.eu-west-1.redshift.amazonaws.com:5439/mydatabase

“Spark-PySpark Redshift JDBC 写入：没有合适的驱动程序/ClassNotFoundException：com.amazon.redshift.jdbc42.Driver 错误”

问题描述投票：0回答：1

1个回答

最新问题

“Spark-PySpark Redshift JDBC 写入：没有合适的驱动程序/ClassNotFoundException：com.amazon.redshift.jdbc42.Driver 错误”

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1