从云外部使用Hadoop客户端访问GCS

问题描述 投票:0回答:2

我想通过Hadoop客户端访问Google云端存储。我想在Google Cloud之外的机器上使用它。

我按照here的指示。我创建了服务帐户并生成了密钥文件。我还创建了core-site.xml文件并下载了必要的库。

但是,当我尝试运行简单的hdfs dfs -ls gs://bucket-name命令时,我得到的是:

Error getting access token from metadata server at: http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token

当我在Google Cloud中执行此操作时,它可以工作,但尝试从外部连接到GCS,它会显示上面的错误。

如何以这种方式使用Hadoop Client连接到GCS?它甚至可能吗?我没有到169.254.169.254地址的路线。

这是我的core-site.xml(我在这个例子中更改了密钥路径和电子邮件):

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <property>
    <name>spark.hadoop.google.cloud.auth.service.account.enable</name>
    <value>true</value>
  </property>
  <property>
    <name>spark.hadoop.google.cloud.auth.service.account.json.keyfile</name>
    <value>path/to/key.json</value>
  </property>
  <property>
    <name>fs.gs.project.id</name>
    <value>ringgit-research</value>
    <description>
      Optional. Google Cloud Project ID with access to GCS buckets.
      Required only for list buckets and create bucket operations.
    </description>
  </property>
  <property>
    <name>fs.AbstractFileSystem.gs.impl</name>
    <value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS</value>
    <description>The AbstractFileSystem for gs: uris.</description>
  </property>
  <property>
    <name>fs.gs.auth.service.account.email</name>
    <value>myserviceaccountaddress@google</value>
    <description>
      The email address is associated with the service account used for GCS
      access when fs.gs.auth.service.account.enable is true. Required
      when authentication key specified in the Configuration file (Method 1)
      or a PKCS12 certificate (Method 3) is being used.
    </description>
  </property>
</configuration>
google-cloud-platform hdfs google-cloud-storage
2个回答
0
投票

可能是hadoop服务还没有在你的core-site.xml文件中进行更新,所以我的建议是重新启动hadoop的服务,你可以采取的另一个动作是检查访问控制选项[1]。

如果您在采取建议的操作后仍然遇到相同的问题,请发布完整的错误消息。

[1] https://cloud.google.com/storage/docs/access-control/


0
投票

问题在于我尝试了错误的身份验证方法。使用的方法假定它在谷歌云中运行,并且它正在尝试连接到谷歌元数据服务器。当在谷歌外面运行时,由于显而易见的原因它不起作用。

答案就是:Migrating 50TB data from local Hadoop cluster to Google Cloud Storage在所选答案中使用了正确的core-site.xml。

应使用属性fs.gs.auth.service.account.keyfile而不是spark.hadoop.google.cloud.auth.service.account.json.keyfile。唯一的区别是这个属性需要p12密钥文件而不是json。

© www.soinside.com 2019 - 2024. All rights reserved.