我已经在我的Mac上安装了Apache Spark (spark-2.4.5-bin-hadoop2.7 ) 。
/Users/xxxx/Software/
此外, 我下载了ojdbc6.jar在下面的路径。
/Users/xxxx/Software/spark/jars
以下是我在环境变量中做的更新。
export SPARK_HOME=/Users/xxxx/Software/spark
export SPARK_CLASSPATH=/Users/xxxx/spark_env/ojdbc6.jar
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
alias python='python3'
export PYSPARK_PYTHON=python3
在终端,我调用了pyspark,并运行了下面的命令,工作正常。
conn_url = "jdbc:oracle:thin:@//xxx.xxx.xxx.xx:1521/USER”
df = spark.read.format("jdbc").option("url",conn_url).option("drive","oracle.jdbc.driver.OracleDriver").option("dbtable”,”table_name”).option("user”,”xxxx”).option("password”,”xxxx”).load()
我成功地查询了数据库。
现在,我正尝试着用PYCHARM做类似的编程。
PyCharm配置:在Preferences->Project Structure中,我添加了如下内容根。
/Users/xxxx/Software/spark/jars/ojdbc6.jar
Users/xxxx/Software/spark-2.4.5-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip
/Users/xxxx/Software/spark-2.4.5-bin-hadoop2.7/python/lib/pyspark.zip
然后当我运行 "main.py"(其中有连接和查询DB的代码)时,我得到了下面的错误。
Status: FailureError: An error occurred while calling o71.load.
: java.sql.SQLException: No suitable driver
at java.sql/java.sql.DriverManager.getDriver(DriverManager.java:298)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions$$anonfun$6.apply(JDBCOptions.scala:105)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions$$anonfun$6.apply(JDBCOptions.scala:105)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.<init>(JDBCOptions.scala:104)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.<init>(JDBCOptions.scala:35)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:32)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:567)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:835)
通过添加jar到项目上下文根,它不会工作,因为火花是要寻找该jar在。$SPARK_HOME/jars
文件夹。要做到这一点,有几个选项。
在你的python主脚本中你可以这样定义:os.environ['PYSPARK_SUBMIT_ARGS'] = "--jars file:///<path-to-driver>/ojdbc-<version>.jar pyspark-shell"
把你的驱动程序的jar添加到 $SPARK_HOME/jars
文件夹。