我正在尝试使用 Intel 的 HiBench 在 gcp 上构建工作负载。我成功地构建了maven项目,并且我确实设置了如下配置:
hadoop.conf:
# Hadoop home
hibench.hadoop.home /usr/lib/hadoop
# The path of hadoop executable
hibench.hadoop.executable ${hibench.hadoop.home}/bin/hadoop
# Hadoop configraution directory
hibench.hadoop.configure.dir ${hibench.hadoop.home}/etc/hadoop
# The root HDFS path to store HiBench data
hibench.hdfs.master hdfs://external-ip-adress-of-my-cluster:8020
# Hadoop release provider. Supported value: apache
hibench.hadoop.release apache
spark.conf
whammouda@hibench-cluster-m:~/HiBench$ cat conf/spark.conf
# Spark home
hibench.spark.home /usr/lib/spark
# Spark master
# standalone mode: spark://xxx:7077
# YARN mode: yarn-client
hibench.spark.master spark://external-ip-adress-of-my-cluster:7077
# executor number and cores when running on Yarn
hibench.yarn.executor.num 2
hibench.yarn.executor.cores 4
# executor and driver memory in standalone & YARN mode
spark.executor.memory 4g
spark.driver.memory 4g
# set spark parallelism property according to hibench's parallelism value
spark.default.parallelism ${hibench.default.map.parallelism}
# set spark sql's default shuffle partitions according to hibench's parallelism value
spark.sql.shuffle.partitions ${hibench.default.shuffle.parallelism}
#======================================================
# Spark Streaming
#======================================================
# Spark streaming Batchnterval in millisecond (default 100)
hibench.streambench.spark.batchInterval 100
# Number of nodes that will receive kafka input (default: 4)
hibench.streambench.spark.receiverNumber 4
# Indicate RDD storage level. (default: 2)
# 0 = StorageLevel.MEMORY_ONLY
# 1 = StorageLevel.MEMORY_AND_DISK_SER
# other = StorageLevel.MEMORY_AND_DISK_SER_2
hibench.streambench.spark.storageLevel 2
# indicate whether to test the write ahead log new feature (default: false)
hibench.streambench.spark.enableWAL false
# if testWAL is true, this path to store stream context in hdfs shall be specified. If false, it can be empty (default: /var/tmp)
hibench.streambench.spark.checkpointPath /var/tmp
hibench.streambench.spark.useDirectMode true
当我运行 bin/workloads/micro/wordcount/prepare/prepare.sh 时 我收到此错误: “在某些路径下找不到文件,请手动设置
" + config_name + "
”
AssertionError:在某些路径下找不到文件,请手动设置hibench.hadoop.examples.test.jar
/home/whammouda/HiBench/bin/functions/workload_functions.sh: 第 38 行: .: 需要文件名参数
。: 用法: 。文件名[参数]
启动 HadoopPrepareWordcount 工作台
./prepare.sh:第26行:INPUT_HDFS:未绑定变量
有关信息,bash 文件如下所示:
current_dir=`dirname "$0"`
current_dir=`cd "$current_dir"; pwd`
root_dir=${current_dir}/../../../../../
workload_config=${root_dir}/conf/workloads/micro/wordcount.conf
. "${root_dir}/bin/functions/load_bench_config.sh"
enter_bench HadoopPrepareWordcount ${workload_config} ${current_dir}
show_bannar start
rmr_hdfs $INPUT_HDFS || true
START_TIME=`timestamp`
run_hadoop_job ${HADOOP_EXAMPLES_JAR} randomtextwriter \
-D mapreduce.randomtextwriter.totalbytes=${DATASIZE} \
-D mapreduce.randomtextwriter.bytespermap=$(( ${DATASIZE} / ${NUM_MAPS} )) \
-D mapreduce.job.maps=${NUM_MAPS} \
-D mapreduce.job.reduces=${NUM_REDS} \
${INPUT_HDFS}
END_TIME=`timestamp`
show_bannar finish
leave_bench
我尝试设置环境变量 ${HADOOP_EXAMPLES_JAR} 以指向我从 maven 下载的 jar 但我仍然遇到相同的错误。 (初始目录下没有hadoop.examples.jar)
您
INPUT_HDFS
变量未设置。尝试类似的事情
set -u
...
rmr_hdfs "$INPUT_HDFS" || true
...
此外,还可以使用 https://www.shellcheck.net/ 修复脚本中的其他错误。