Hadoop 作业挂在已接受状态,纱线资源管理器日志 java.net.UnknownHostException

问题描述 投票:0回答:2

如标题所述,我在内网部署了一个hadoop v2.6.3集群,静态ip如10.0.0.x。 然后我运行了一个示例 WordCount 程序 但是,shell 只是给出输出并挂起:

hadoop jar wc.jar WordCount /user/alex/data/kaggle.sample /user/alex/wc/output  
16/04/06 10:44:29 INFO client.RMProxy: Connecting to ResourceManager at master/10.0.0.7:8032
16/04/06 10:44:29 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
16/04/06 10:44:30 INFO input.FileInputFormat: Total input paths to process : 1
16/04/06 10:44:30 INFO mapreduce.JobSubmitter: number of splits:1
16/04/06 10:44:30 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1459942813464_0002
16/04/06 10:44:30 INFO impl.YarnClientImpl: Submitted application application_1459942813464_0002
16/04/06 10:44:30 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1459942813464_0002/
16/04/06 10:44:30 INFO mapreduce.Job: Running job: job_1459942813464_0002

然后我进入 Hadoop Cluster Web UI,发现作业状态为 ACCEPTED,并且未运行。我检查了YARN.ResourceManager的日志文件,它的最后一条ERROR消息是这样的:

2016-04-06 10:34:42,466 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: Error trying to assign container token and NM token to an allocated container container_1459942813464_0001_02_000001
java.lang.IllegalArgumentException: java.net.UnknownHostException: worker14.alex
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:374)
at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:256)
at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:220)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.pullNewlyAllocatedContainersAndNMTokens(SchedulerApplicationAttempt.java:448)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.getAllocation(FiCaSchedulerApp.java:269)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocate(CapacityScheduler.java:896)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:937)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:930)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:842)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:823)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.UnknownHostException: worker14.alex
... 19 more

Hadoop 配置文件如下:

#core-site.xml
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://master:8020/</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/alex/hadoop-2.6.3/tmp/</value>
    </property>
</configuration>
#yarn-site.xml
<configuration>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>master</value>
    </property>
    <property>
        <name>yarn.nodemanager.local-dirs</name>
        <value>/home/alex/hadoop-2.6.3/tmp/nm.local</value>
    </property>
    <property>
        <name>yarn.nodemanager.log-dirs</name>
        <value>/home/alex/hadoop-2.6.3/log/nm.log</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>
#mapred-site.xml
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>10.0.0.7:10020</value>
    </property>
    <property>
        <name>yarn.app.mapreduce.am.staging-dir</name>
        <value>/home/alex/hadoop-2.6.3/tmp/staging</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.intermediate-done-dir</name>
        <value>/home/alex/hadoop-2.6.3/tmp/mr-history/tmp</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.done-dir</name>
        <value>/home/alex/hadoop-2.6.3/tmp/mr-history/done</value>
    </property>
</configuration>

/etc/hosts
文件已将 ip 映射到 master 或worker1 -worker14

slaves
文件是 master、worker1 -worker14

我的主机名解析似乎出了问题。它是

worker14.alex
而不是
worker14
alex
是我的linux用户名)

那么我的配置有什么问题吗?我需要重新启动所有服务器吗?或者我只需要重新启动一些服务,例如

service networking restart

java hadoop hostname
2个回答
0
投票

你能达成决议吗?我看到了完全相同的问题,我看到 Caused by: java.net.UnknownHostException: var 异常。 – 尼尚特·凯尔卡

检查你的yarn-site.xml,这个值:

<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/var/log/hadoop-yarn/apps</value>

如果在路径前加上“hdfs://”,则会出现错误。


0
投票

出现此问题是因为您没有在

yarn.nodemanager.hostname
中配置
yarn-site.xml
属性。
要解决此问题,请在具有该 IP 地址的每台 NodeManager 计算机的
yarn-site.xml
中配置此属性:

  <property>
    <name>yarn.nodemanager.hostname</name>
    <value>current_worker_ip</value>
  </property>
© www.soinside.com 2019 - 2024. All rights reserved.