我正在尝试从MariaDB读取查询结果到pyspark数据框。我用过的罐子是
--jars mariadb-java-client-2.2.2.jar
我可以使用]读取表格>
df = spark.read.format("jdbc")\ .option("url","jdbc:mariadb://xxx.xxx.xx.xx:xxxx/hdpms")\ .option("driver", "org.mariadb.jdbc.Driver")\ .option("dbtable", Mytable)\ .option("user", "xxxxx_xxxxx")\ .option("password", "xxxxx")\ .load()
现在我正在寻找一个命令来运行一个简单的查询,例如
SELECT col1,col2,col3,.. From MyTable Where date>2019 and cond2;
尽管我可以通过使用以下命令来运行它:>
"MyTable date>2019 and cond2 --"
因为jar在开头添加
SELECT * FROM
,在结尾添加where 1=0
但我面临以下错误py4j.protocol.Py4JJavaError: An error occurred while calling o455.showString. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 (TID 12, xhadoopm3095p.aetna.com, executor 2): java.sql.SQLException: Value "DATE_CREATED" cannot be parse as Timestamp at org.mariadb.jdbc.internal.com.read.resultset.rowprotocol.TextRowProtocol.getInternalTimestamp(TextRowProtocol.java:592) at org.mariadb.jdbc.internal.com.read.resultset.SelectResultSet.getTimestamp(SelectResultSet.java:1178) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$11.apply(JdbcUtils.scala:439) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$11.apply(JdbcUtils.scala:438)
任何人都可以帮我这个忙。谢谢
我正在尝试从MariaDB读取查询结果到pyspark数据框。我使用的jar是--jars mariadb-java-client-2.2.2.jar我能够使用df = spark.read.format(“ jdbc”)...读取表。
df = spark.read.format("jdbc")\
.option("url","jdbc:mariadb://xxx.xxx.xx.xx:xxxx/hdpms")\
.option("driver", "org.mariadb.jdbc.Driver")\
.option("dbtable", "(SELECT col1,col2,col3,.. From MyTable Where date>2019 and cond2) tmp")\
.option("user", "xxxxx_xxxxx")\
.option("password", "xxxxx")\
.load()
使用查询为表创建别名,它将起作用