如何使用scala Spark或Python Spark检查Azure Blob存储路径是否存在

问题描述 投票:0回答:3

请告诉我如何检查以下 blob 文件是否存在。

文件路径名称:“wasbs://[电子邮件受保护]/directoryname/meta_loaddate=20190512/”

scala azure pyspark blob azure-blob-storage
3个回答
0
投票

下面是适合您的代码,请随意根据您的需要进行编辑/自定义:

  from azure.storage.blob import BlockBlobService

session = SparkSession.builder.getOrCreate() #setup spark session
    session.conf.set("fs.azure.account.key.storage-account-name.blob.core.windows.net","<storage-account-key>")
    sdf = session.read.parquet("wasbs://<container-name>@<storage-account-name>.blob.core.windows.net/<prefix>")

    block_blob_service = BlockBlobService(account_name='', account_key='')

    def blob_exists():

            container_name = ""
            blob_name = ""

            exists=(block_blob_service.exists(container_name, blob_name))
            return exists
    blobstat = blob_exists()
    print(blobstat)# will return a boolean if the blob exists = True, else False

0
投票
private val storageConnectionString = s"DefaultEndpointsProtocol=http;AccountName=$account;AccountKey=$accessKey"

private val cloudStorageAccount = CloudStorageAccount.parse(storageConnectionString)

private val serviceClient = cloudStorageAccount.createCloudBlobClient

private val container = serviceClient.getContainerReference("data")

val ref = container.getBlockBlobReference(path)
val existOrNot = ref.exist()

0
投票

考虑到人们想要使用 Spark 配置来执行此检查 - 并使用保存在那里的秘密,可以使用以下代码:

val fs = FileSystem.get(new URI(<url to blob storage>), spark.sparkContext.hadoopConfiguration)
val path = new Path(url + s"/$directory")
fs.exists(path)
© www.soinside.com 2019 - 2024. All rights reserved.