背景:
问题:
我所看到的事情:
oozie 管道示例:
Java_Action_1 (which points to a java class that is being run)
Java_Action_2 (which points to a java class that is being run)
Java_Action_3 (which points to a java class that is being run)
Subworkflow_1 (has a fork and join step, seen it in the Oozie UI)
Java_Action_1_in_subworkflow (which points to a java class that is being run) -> job that is not writing to HDFS
Java_Action_1_in_subworkflow (which points to a java class that is being run)
Java_Action_4 (which points to a java class that is being run)
Java_Action_5 (which points to a java class that is being run)
etc.
问题出在 fs.defaultFS hadoop 属性上。我们使用的是 viewfs,并且提供给 apache crunch 的输出路径以 viewfs:// 为前缀。因此它无法写入 HDFS。所以我们在写入阶段将defaultFS设置为hdfs://。读取来自 s3 存储桶,该存储桶作为 /folder_name 安装在 hdfs 上。对于读取阶段,文件必须以 viewfs:// 为前缀。