星火SQL查询优化

问题描述 投票:-2回答:2

我想在火花数据帧加载数据表。我有2台在我database.Is有必要写2次完整的连接选项?有什么办法来写公用部分一次,然后只需改变变量表名的多个时间。

table1 = spark.read\
.format("jdbc")\
.option("url","jdbc:oracle:thin:USER/Password@host:port/db_name")\
.option("driver","oracle.jdbc.driver.OracleDriver" )\
.option("dbtable","table_name_1")\
.load()


table2 = spark.read\
    .format("jdbc")\
    .option("url","jdbc:oracle:thin:USER/Password@host:port/db_name")\
    .option("driver","oracle.jdbc.driver.OracleDriver" )\
    .option("dbtable","table_name_2")\
    .load()
python-3.x apache-spark pyspark bigdata data-science
2个回答
1
投票

你可以单独建立读者

reader = (spark.read
  .format("jdbc")
  .option("url","jdbc:oracle:thin:USER/Password@host:port/db_name")
  .option("driver","oracle.jdbc.driver.OracleDriver" ))

和负载

table1 = reader.option("dbtable","table_name_1").load()
table2 = reader.option("dbtable","table_name_2").load()

2
投票

请在下面摘录,希望它可以帮助你。

def load_table_df(table_name):
    # You can define "jdbc:oracle:thin:USER/Password@host:port/db_name" as parameter too.
    return spark.read\
        .format("jdbc")\
        .option("url","jdbc:oracle:thin:USER/Password@host:port/db_name")\
        .option("driver","oracle.jdbc.driver.OracleDriver" )\
        .option("dbtable", table_name)\
        .load()

table1 = load_table_df('table_name_1')
table2 = load_table_df('table_name_2')
© www.soinside.com 2019 - 2024. All rights reserved.