Https://docs.databricks.com/en/connect/connect/storage/gcs.html 我已经使用这些设置配置了SPARK,以确保其使用此文档之后的服务帐户进行身份验证:
Https://docs.databricks.com/en/connect/connect/storage/gcs.html
spark.conf.set("spark.hadoop.google.cloud.auth.service.account.enable", "true")
spark.conf.set("spark.hadoop.fs.gs.auth.service.account.email", client_email)
spark.conf.set("spark.hadoop.fs.gs.project.id", project_id)
spark.conf.set("spark.hadoop.fs.gs.auth.service.account.private.key", private_key)
spark.conf.set("spark.hadoop.fs.gs.auth.service.account.private.key.id", private_key_id)
spark.conf.set("spark.hadoop.fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem")
spark.conf.set("spark.hadoop.fs.AbstractFileSystem.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS")
以我的GCS存储桶读取测试CSV文件: GCS_PATH=“ gs://ddfsdfts/events/31dfsdfs4_2025_02_01_0000000000000000000000.csv”
df = spark.read.format("csv") \
.option("header", "true") \
.option("inferSchema", "true") \
.load(gcs_path)
df.show()
尝试df.show()
我看到了其他一些这样的问题,但没有直截了当的答案。 为什么要尝试到达元数据服务器令牌?
使用spark.conf.set设置它们,请尝试将它们设置在群集级别,如下所述HEREY
> spark.hadoop.google.cloud.auth.service.account.enable true
> spark.hadoop.fs.gs.auth.service.account.email <client-email>
> spark.hadoop.fs.gs.project.id <project-id>
> spark.hadoop.fs.gs.auth.service.account.private.key
> {{secrets/scope/gsa_private_key}}
> spark.hadoop.fs.gs.auth.service.account.private.key.id
> {{secrets/scope/gsa_private_key_id}}