如标题所述,我在内网部署了一个hadoop v2.6.3集群,静态ip如10.0.0.x。 然后我运行了一个示例 WordCount 程序 但是,shell 只是给出输出并挂起:
hadoop jar wc.jar WordCount /user/alex/data/kaggle.sample /user/alex/wc/output
16/04/06 10:44:29 INFO client.RMProxy: Connecting to ResourceManager at master/10.0.0.7:8032
16/04/06 10:44:29 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
16/04/06 10:44:30 INFO input.FileInputFormat: Total input paths to process : 1
16/04/06 10:44:30 INFO mapreduce.JobSubmitter: number of splits:1
16/04/06 10:44:30 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1459942813464_0002
16/04/06 10:44:30 INFO impl.YarnClientImpl: Submitted application application_1459942813464_0002
16/04/06 10:44:30 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1459942813464_0002/
16/04/06 10:44:30 INFO mapreduce.Job: Running job: job_1459942813464_0002
然后我进入 Hadoop Cluster Web UI,发现作业状态为 ACCEPTED,并且未运行。我检查了YARN.ResourceManager的日志文件,它的最后一条ERROR消息是这样的:
2016-04-06 10:34:42,466 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: Error trying to assign container token and NM token to an allocated container container_1459942813464_0001_02_000001
java.lang.IllegalArgumentException: java.net.UnknownHostException: worker14.alex
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:374)
at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:256)
at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:220)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.pullNewlyAllocatedContainersAndNMTokens(SchedulerApplicationAttempt.java:448)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.getAllocation(FiCaSchedulerApp.java:269)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocate(CapacityScheduler.java:896)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:937)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:930)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:842)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:823)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.UnknownHostException: worker14.alex
... 19 more
Hadoop 配置文件如下:
#core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:8020/</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/alex/hadoop-2.6.3/tmp/</value>
</property>
</configuration>
#yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/home/alex/hadoop-2.6.3/tmp/nm.local</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/home/alex/hadoop-2.6.3/log/nm.log</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
#mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>10.0.0.7:10020</value>
</property>
<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/home/alex/hadoop-2.6.3/tmp/staging</value>
</property>
<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/home/alex/hadoop-2.6.3/tmp/mr-history/tmp</value>
</property>
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/home/alex/hadoop-2.6.3/tmp/mr-history/done</value>
</property>
</configuration>
/etc/hosts
文件已将 ip 映射到 master 或worker1 -worker14
slaves
文件是 master、worker1 -worker14
我的主机名解析似乎出了问题。它是
worker14.alex
而不是worker14
(alex
是我的linux用户名)
那么我的配置有什么问题吗?我需要重新启动所有服务器吗?或者我只需要重新启动一些服务,例如
service networking restart
?
你能达成决议吗?我看到了完全相同的问题,我看到 Caused by: java.net.UnknownHostException: var 异常。 – 尼尚特·凯尔卡
检查你的yarn-site.xml,这个值:
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/var/log/hadoop-yarn/apps</value>
如果在路径前加上“hdfs://”,则会出现错误。
出现此问题是因为您没有在
yarn.nodemanager.hostname
中配置 yarn-site.xml
属性。yarn-site.xml
中配置此属性:
<property>
<name>yarn.nodemanager.hostname</name>
<value>current_worker_ip</value>
</property>