我是 GNU Parallel 的新手,正在尝试运行一些模拟。我有一个 bash 脚本,正在通过 SLURM 提交到集群。下面给出了脚本。本质上,并行调用一个函数 run_simulation,该函数将调用其中的 bash 脚本。 bash 脚本在当前目录中生成输出,每个作业的输出都不同。
#!/bin/bash
# Job name:
#SBATCH --job-name=Run_MD_Sim
#
# Account:
#SBATCH --account=fc_mllam
#
# Partition:
#SBATCH --partition=savio3
#
# Request one node:
#SBATCH --nodes=1
#
# Specify number of tasks for use case (example):
#
#
# Processors per task:
#SBATCH --cpus-per-task=2
#
# Wall clock limit:
#SBATCH --time=5:00:30
#
## Command(s) to run (example):
module load intel
module load openmpi
module load gcc
module load cmake
module load gnu-parallel/2019.03.22
energy_list=("90")
fluence_list=("1000")
len_energy=${#energy_list[@]}
len_fluence=${#fluence_list[@]}
# Change this line if number of nodes requested is changed
val="ALE_Cycle_Run_2.sh"
# Function to run MD simulation for a single combination of energy and fluence
run_simulation() {
enval="$1"
flval="$2"
counter="$3"
val="$4"
# Create a directory to carry out computations. If node=1, then we are in fcmd_bondorder
mkdir "Temp_Directory_$counter"
# Check if using more than one node. If more than one node is used, then working directory will be the home directory. Below lines will change
cp ../temp_000588-322.cfg "Temp_Directory_$counter/temp_000000-000.cfg"
# Copy simulation files into this folder
cp *.o "Temp_Directory_$counter/"
cp *.cpp "Temp_Directory_$counter/"
cp *.h "Temp_Directory_$counter/"
cp Makefile "Temp_Directory_$counter/"
cp "$val" "Temp_Directory_$counter/"
cp Bond_Param_Gen.sh "Temp_Directory_$counter/Bond_Param_Gen.sh"
# Change directory to temporary directory
cd "Temp_Directory_$counter"
# Run the main MD simulation. The output will be stored in the current directory
bash "$val" "$enval" "$flval"
# Make directory to store the bond-order files
mkdir Data/
mv *.txt Data/
rm Data/*.txt
bash Bond_Param_Gen.sh
mv *.txt Data/
mv *.cfg Data/
# Home directory or scratch directory
directory="/global/home/users/shoubhaniknath"
new_filename="Data ${flval} impacts energy ${enval} number ${counter}"
# Rename and move the data folder
mv "Data" "$directory/$new_filename"
}
# Export the function so that GNU Parallel can access it
export -f run_simulation
# Set number of jobs based on number of cores available and number of threads per core
export JOBS_PER_NODE=$(( $SLURM_CPUS_ON_NODE / $SLURM_CPUS_PER_TASK ))
# Run simulations in parallel
for enval in "${energy_list[@]}"; do
for flval in "${fluence_list[@]}"; do
# Use GNU Parallel to parallelize the loop over 'counter'
# Use below line for multiple nodes
# parallel --dry-run --jobs $JOBS_PER_NODE --slf hostfile run_simulation "$enval" "$flval" {} "$val" ::: {1..3}
# For single node, use below line
echo $JOBS_PER_NODE
parallel --jobs $JOBS_PER_NODE --joblog task.log --resume --bar run_simulation "$enval" "$flval" {} "$val" ::: {1..3}
done
done
我的问题是我无法打印并行的进度条,并且不知道为什么。在当前工作目录中执行的简单并行命令确实会显示进度条。我在这里做错了什么?
尝试这样的事情:
parallel --jobs $JOBS_PER_NODE --joblog task.log --resume --bar run_simulation {2} {3} {1} "$val" ::: {1..3} ::: "${energy_list[@]}" ::: "${fluence_list[@]}"
后来想通了。进度条只会在计算节点显示,所以要查看进度条,应该使用 srun