问题很简单:你有一个本地spark实例(无论是集群还是仅以本地模式运行它),你想从gs://读取
我在这里通过组合不同的资源提交了我提出的解决方案:
$SPARK/jars/
文件夹中(检查底部的备选方案1)core-site.xml
下载here文件,或从下面复制。这是hadoop使用的配置文件(由spark使用)。core-site.xml
文件存储在一个文件夹中。我个人创建了$SPARK/conf/hadoop/conf/
文件夹并将其存储在那里。export HADOOP_CONF_DIR=
=</absolute/path/to/hadoop/conf/>
Google Console-> API-Manager-> Credentials
)的相应页面创建OAUTH2密钥。core-site.xml
文件。备选方案1:您可以将jar存储在任何文件夹中,并将文件夹添加到spark类路径中,而不是将文件复制到$SPARK/jars
文件夹。一种方法是编辑SPARK_CLASSPATH
SPARK_CLASSPATH`中的spark-env.sh``folder but
现已弃用。因此,可以看看如何在spark类路径中添加jar的here
<configuration>
<property>
<name>fs.gs.impl</name>
<value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem</value>
<description>Register GCS Hadoop filesystem</description>
</property>
<property>
<name>fs.gs.auth.service.account.enable</name>
<value>false</value>
<description>Force OAuth2 flow</description>
</property>
<property>
<name>fs.gs.auth.client.id</name>
<value>32555940559.apps.googleusercontent.com</value>
<description>Client id of Google-managed project associated with the Cloud SDK</description>
</property>
<property>
<name>fs.gs.auth.client.secret</name>
<value>fslkfjlsdfj098ejkjhsdf</value>
<description>Client secret of Google-managed project associated with the Cloud SDK</description>
</property>
<property>
<name>fs.gs.project.id</name>
<value>_THIS_VALUE_DOES_NOT_MATTER_</value>
<description>This value is required by GCS connector, but not used in the tools provided here.
The value provided is actually an invalid project id (starts with `_`).
</description>
</property>
</configuration>