我有一个使用旧API运行的hadoop作业,我将我的实现移动到新的API并且在运行它时遇到了问题。当作业运行时,不会抛出任何异常,但我从未获得任何输出文件。在旧的API下,它将生成带有我的排序结果列表的输出文件。这是正在运行的工作:
Configuration config = new Configuration();
Job job = Job.getInstance(config, "sorting");
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(SortMapper.class);
job.setCombinerClass(SortReducer.class);
job.setReducerClass(SortReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job, new Path(inputFileLocation));
FileOutputFormat.setOutputPath(job, new Path(outputFileLocation));
job.setJarByClass(HadoopTest.class);
long startTime = System.currentTimeMillis();
job.submit();
long endTime = System.currentTimeMillis();
long duration = endTime - startTime;
System.out.println("Duration: " + duration);
这是我的映射器impl:
public static class SortMapper extends MultithreadedMapper<LongWritable, Text, IntWritable, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private IntWritable intKey = new IntWritable();
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
intKey.set(Integer.parseInt(value.toString()));
context.write(intKey, one);
}
}
这是我的减速机impl:
public static class SortReducer extends Reducer<IntWritable, IntWritable, IntWritable, IntWritable> {
@Override
protected void reduce(IntWritable key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0;
Iterator<IntWritable> iterator = values.iterator();
while (iterator.hasNext()) {
sum += iterator.next().get();
}
context.write(key, new IntWritable(sum));
}
}
日志显示如下(当使用旧API运行时,我总是抱怨“无法加载领域映射信息......”和“无法加载native-hadoop ...”:
2014-03-18 10:19:41.299 java[13311:1d03] Unable to load realm mapping info from SCDynamicStore
14/03/18 10:19:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/03/18 10:19:41 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
14/03/18 10:19:41 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
14/03/18 10:19:41 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
14/03/18 10:19:41 WARN mapreduce.JobSubmitter: No job jar file set. User classes may not be found. See Job or Job#setJar(String).
14/03/18 10:19:41 INFO input.FileInputFormat: Total input paths to process : 2
14/03/18 10:19:41 INFO mapreduce.JobSubmitter: number of splits:2
14/03/18 10:19:42 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local904621238_0001
14/03/18 10:19:42 WARN conf.Configuration: file:/tmp/hadoop-james.mchugh/mapred/staging/james.mchugh904621238/.staging/job_local904621238_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
14/03/18 10:19:42 WARN conf.Configuration: file:/tmp/hadoop-james.mchugh/mapred/staging/james.mchugh904621238/.staging/job_local904621238_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
14/03/18 10:19:42 WARN conf.Configuration: file:/tmp/hadoop-james.mchugh/mapred/local/localRunner/james.mchugh/job_local904621238_0001/job_local904621238_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
14/03/18 10:19:42 WARN conf.Configuration: file:/tmp/hadoop-james.mchugh/mapred/local/localRunner/james.mchugh/job_local904621238_0001/job_local904621238_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
14/03/18 10:19:42 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
14/03/18 10:19:42 INFO mapred.LocalJobRunner: OutputCommitter set in config null
试试job.waitForCompletion(true);
而不是job.submit();
。由于您在本地运行mapreduce,因此应在JUnit终止本地jobtracker之前等待结果。