我有以下sparksql(spark池-Spark 3.0)代码,我想传递一个变量。我该怎么做?我尝试了以下内容:
#cel 1 (Toggle parameter cell):
%%pyspark
stat = 'A'
#cel2:
select * from silver.employee_dim where Status= '$stat'
当您以Pyspark的身份运行单元格时,您可以将一个变量传递给您的查询:
#cel 1 (Toggle parameter cell):
%%pyspark
stat = 'A' #define variable
#cel2:
%%pyspark
query = "select * from silver.employee_dim where Status='" + stat + "'"
spark.sql(query) #execute SQL
由于您正在执行选择语句,我认为您可能希望将结果加载到数据框中:
sqlDf = spark.sql(query)
sqlDf.head(5) #select first 5 rows
如果任何人都在寻找另一种方式,如Microsoft在此链接中的答案中提到https://learn.microsoft.com/en-us/aswers/questions/419296/spark-sql-sql-passing-variables-variables-synapse-(spark--spark--池)
细胞1
%%pyspark
myVar = 'test'
spark.conf.set("myapp.myVar", myVar)
细胞2
%%sql
SELECT * FROM myTable WHERE myVal = '${myapp.myVar}'
i发现这个答案真的很有用,我只想补充一点,虽然没有必要,但添加“ set”命令似乎可以消除突触笔记本中被标记的潜在错误,魔术sqlcell.
%%sql
SET myapp.myVar;
SELECT * FROM myTable WHERE myVal = '${myapp.myVar}'