我正在尝试运行有关 Hadoop Pipes 的教程中的示例:
我在编译和一切方面都取得了成功。然而,运行后它显示了一个
NullPointerException
错误。我尝试了很多方法并阅读了很多类似的问题,但无法找到该问题的实际解决方案。
注意:我是在单机上运行的伪分布式环境。
hadoop pipes -D hadoop.pipes.java.recordreader=true -D hadoop.pipes.java.recordwriters=true -input /input -output /output -program /bin/wordcount
DEPRECATED: Use of this script to execute mapred command is deprecated.
Instead use the mapred command for it.
15/02/18 01:09:02 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
15/02/18 01:09:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
15/02/18 01:09:02 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
15/02/18 01:09:03 WARN mapreduce.JobSubmitter: No job jar file set. User classes may not be found. See Job or Job#setJar(String).
15/02/18 01:09:04 INFO mapred.FileInputFormat: Total input paths to process : 1
15/02/18 01:09:04 INFO mapreduce.JobSubmitter: number of splits:1
15/02/18 01:09:04 INFO Configuration.deprecation: hadoop.pipes.java.recordreader is deprecated. Instead, use mapreduce.pipes.isjavarecordreader
15/02/18 01:09:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local143452495_0001
15/02/18 01:09:06 INFO mapred.LocalDistributedCacheManager: Localized hdfs://localhost:9000/bin/wordcount as file:/tmp/hadoop-abdulrahman/mapred/local/1424214545411/wordcount
15/02/18 01:09:06 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
15/02/18 01:09:06 INFO mapred.LocalJobRunner: OutputCommitter set in config null
15/02/18 01:09:06 INFO mapreduce.Job: Running job: job_local143452495_0001
15/02/18 01:09:06 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
15/02/18 01:09:06 INFO mapred.LocalJobRunner: Waiting for map tasks
15/02/18 01:09:06 INFO mapred.LocalJobRunner: Starting task: attempt_local143452495_0001_m_000000_0
15/02/18 01:09:06 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
15/02/18 01:09:06 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/input/data.txt:0+68
15/02/18 01:09:07 INFO mapred.MapTask: numReduceTasks: 1
15/02/18 01:09:07 INFO mapreduce.Job: Job job_local143452495_0001 running in uber mode : false
15/02/18 01:09:07 INFO mapreduce.Job: map 0% reduce 0%
15/02/18 01:09:07 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
15/02/18 01:09:07 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
15/02/18 01:09:07 INFO mapred.MapTask: soft limit at 83886080
15/02/18 01:09:07 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
15/02/18 01:09:07 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
15/02/18 01:09:07 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
15/02/18 01:09:08 INFO mapred.LocalJobRunner: map task executor complete.
15/02/18 01:09:08 WARN mapred.LocalJobRunner: job_local143452495_0001
java.lang.Exception: java.lang.NullPointerException
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.NullPointerException
at org.apache.hadoop.mapred.pipes.Application.<init>(Application.java:104)
at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:69)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/02/18 01:09:08 INFO mapreduce.Job: Job job_local143452495_0001 failed with state FAILED due to: NA
15/02/18 01:09:08 INFO mapreduce.Job: Counters: 0
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
at org.apache.hadoop.mapred.pipes.Submitter.runJob(Submitter.java:264)
at org.apache.hadoop.mapred.pipes.Submitter.run(Submitter.java:503)
at org.apache.hadoop.mapred.pipes.Submitter.main(Submitter.java:518)
我下载了hadoop的源代码并跟踪了异常发生的位置,看来异常发生在初始化阶段,因此mapper/reducer内部的代码并不是真正的问题。
Hadoop 中产生异常的函数是这个:
/** Run a set of tasks and waits for them to complete. */
435 private void runTasks(List<RunnableWithThrowable> runnables,
436 ExecutorService service, String taskType) throws Exception {
437 // Start populating the executor with work units.
438 // They may begin running immediately (in other threads).
439 for (Runnable r : runnables) {
440 service.submit(r);
441 }
442
443 try {
444 service.shutdown(); // Instructs queue to drain.
445
446 // Wait for tasks to finish; do not use a time-based timeout.
447 // (See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6179024)
448 LOG.info("Waiting for " + taskType + " tasks");
449 service.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
450 } catch (InterruptedException ie) {
451 // Cancel all threads.
452 service.shutdownNow();
453 throw ie;
454 }
455
456 LOG.info(taskType + " task executor complete.");
457
458 // After waiting for the tasks to complete, if any of these
459 // have thrown an exception, rethrow it now in the main thread context.
460 for (RunnableWithThrowable r : runnables) {
461 if (r.storedException != null) {
462 throw new Exception(r.storedException);
463 }
464 }
465 }
问题是它存储异常然后抛出它,这使我无法知道异常的实际来源。
如何解决这个问题?
经过大量研究,我发现问题实际上是由 Pipes/Application.java 中的这一行(第 104 行)引起的:
byte[] password= jobToken.getPassword();
我更改了代码并重新编译了hadoop:
byte[] password= "no password".getBytes();
if (jobToken != null)
{
password= jobToken.getPassword();
}
我从这里
得到这个这解决了问题,我的程序当前正在运行,但我面临另一个问题,程序实际上挂在地图 0% 减少 0% 我会为这个问题另开一个话题。
谢谢你,