从Scala访问Azure Data Lake Storage gen2

问题描述 投票:0回答:1

我能够从运行在Azure Databricks上的笔记本连接到ADLS gen2,但无法使用jar连接作业。我使用了与笔记本中相同的设置,除了使用dbutils。

我在Scala代码中使用了与笔记本中的Spark conf相同的设置。

笔记本:

spark.conf.set(
"fs.azure.account.key.xxxx.dfs.core.windows.net",
dbutils.secrets.get(scope = "kv-secrets", key = "xxxxxx"))

spark.conf.set
("fs.azure.createRemoteFileSystemDuringInitialization", "true")

spark.conf.set
("fs.azure.createRemoteFileSystemDuringInitialization", "false")

val rdd = sqlContext.read.format
("csv").option("header", 
"true").load(
"abfss://[email protected]/test/sample.csv")
// Convert rdd to data frame using toDF; the following import is 
//required to use toDF function.
val df: DataFrame = rdd.toDF()
// Write file to parquet
df.write.parquet
("abfss://[email protected]/test/Sales.parquet")

Scala代码:

val sc = SparkContext.getOrCreate()
val spark = SparkSession.builder().getOrCreate()
sc.getConf.setAppName("Test")

sc.getConf.set("fs.azure.account.key.xxxx.dfs.core.windows.net",
"<actual key>")

sc.getConf.set("fs.azure.account.auth.type", "OAuth")

sc.getConf.set("fs.azure.account.oauth.provider.type",
"org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")

sc.getConf.set("fs.azure.account.oauth2.client.id", "<app id>")

sc.getConf.set("fs.azure.account.oauth2.client.secret", "<app password>")

sc.getConf.set("fs.azure.account.oauth2.client.endpoint",
  "https://login.microsoftonline.com/<tenant id>/oauth2/token")

sc.getConf.set
("fs.azure.createRemoteFileSystemDuringInitialization", "false")

val sqlContext = spark.sqlContext
val rdd = sqlContext.read.format
("csv").option("header", 
"true").load
("abfss://[email protected]/test/sample.csv")
// Convert rdd to data frame using toDF; the following import is 
//required to use toDF function.
val df: DataFrame = rdd.toDF()
println(df.count())
// Write file to parquet

df.write.parquet
("abfss://[email protected]/test/Sales.parquet")

我希望拼花文件能够写好。相反,我得到以下错误:19/04/20 13:58:40错误未捕获用户代码的throwable:找不到配置属性xxxx.dfs.core.windows.net。 at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AbfsConfiguration.getStorageAccountKey(AbfsConfiguration.java:385)at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.initializeClient(AzureBlobFileSystemStore。 java:802)at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore。(AzureBlobFileSystemStore.java:133)at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize (AzureBlobFileSystem.java:103)org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)

scala apache-spark azure-data-lake azure-databricks
1个回答
0
投票

没关系,愚蠢的错误。它应该是: `val sc = SparkContext.getOrCreate()val spark = SparkSession.builder()。getOrCreate()sc.getConf.setAppName(“Test”)

spark.conf.set(“fs.azure.account.key.xxxx.dfs.core.windows.net”,“”)

spark.conf.set(“fs.azure.account.auth.type”,“OAuth”)

spark.conf.set(“fs.azure.account.oauth.provider.type”,“org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider”)

spark.conf.set(“fs.azure.account.oauth2.client.id”,“”)

spark.conf.set(“fs.azure.account.oauth2.client.secret”,“”)

spark.conf.set(“fs.azure.account.oauth2.client.endpoint”,“https://login.microsoftonline.com/ / oauth2 / token”)

spark.conf.set(“fs.azure.createRemoteFileSystemDuringInitialization”,“false”)`

© www.soinside.com 2019 - 2024. All rights reserved.