使用 SLURM 脚本在 GNU Parallel 中不显示进度条

Question

我是 GNU Parallel 的新手，正在尝试运行一些模拟。我有一个 bash 脚本，正在通过 SLURM 提交到集群。下面给出了脚本。本质上，并行调用一个函数 run_simulation，该函数将调用其中的 bash 脚本。 bash 脚本在当前目录中生成输出，每个作业的输出都不同。

#!/bin/bash
# Job name:
#SBATCH --job-name=Run_MD_Sim
#
# Account:
#SBATCH --account=fc_mllam
#
# Partition:
#SBATCH --partition=savio3
#
# Request one node:
#SBATCH --nodes=1
#
# Specify number of tasks for use case (example):
#
#
# Processors per task:
#SBATCH --cpus-per-task=2
#
# Wall clock limit:
#SBATCH --time=5:00:30
#
## Command(s) to run (example):

module load intel
module load openmpi
module load gcc
module load cmake
module load gnu-parallel/2019.03.22

energy_list=("90")
fluence_list=("1000")

len_energy=${#energy_list[@]}
len_fluence=${#fluence_list[@]}

# Change this line if number of nodes requested is changed
val="ALE_Cycle_Run_2.sh"

# Function to run MD simulation for a single combination of energy and fluence
run_simulation() {
    enval="$1"
    flval="$2"
    counter="$3"
    val="$4"
    
    # Create a directory to carry out computations. If node=1, then we are in fcmd_bondorder
    mkdir "Temp_Directory_$counter"
    
    # Check if using more than one node. If more than one node is used, then working directory will be the home directory. Below lines will change
    cp ../temp_000588-322.cfg "Temp_Directory_$counter/temp_000000-000.cfg"

    # Copy simulation files into this folder
    cp *.o "Temp_Directory_$counter/"
    cp *.cpp "Temp_Directory_$counter/"
    cp *.h "Temp_Directory_$counter/"
    cp Makefile "Temp_Directory_$counter/"
    cp "$val" "Temp_Directory_$counter/"
    cp Bond_Param_Gen.sh "Temp_Directory_$counter/Bond_Param_Gen.sh"

    # Change directory to temporary directory
    cd "Temp_Directory_$counter"
    
    # Run the main MD simulation. The output will be stored in the current directory
    bash "$val" "$enval" "$flval"

    # Make directory to store the bond-order files
    mkdir Data/
    mv *.txt Data/
    rm Data/*.txt
    bash Bond_Param_Gen.sh
    mv *.txt Data/
    mv *.cfg Data/

    # Home directory or scratch directory
    directory="/global/home/users/shoubhaniknath"
    new_filename="Data ${flval} impacts energy ${enval} number ${counter}"

    # Rename and move the data folder
    mv "Data" "$directory/$new_filename"
}

# Export the function so that GNU Parallel can access it
export -f run_simulation

# Set number of jobs based on number of cores available and number of threads per core
export JOBS_PER_NODE=$(( $SLURM_CPUS_ON_NODE / $SLURM_CPUS_PER_TASK ))

# Run simulations in parallel
for enval in "${energy_list[@]}"; do
    for flval in "${fluence_list[@]}"; do
        # Use GNU Parallel to parallelize the loop over 'counter'
    # Use below line for multiple nodes
    #  parallel --dry-run --jobs $JOBS_PER_NODE --slf hostfile run_simulation "$enval" "$flval" {} "$val" ::: {1..3}
    # For single node, use below line
    echo $JOBS_PER_NODE
    parallel --jobs $JOBS_PER_NODE --joblog task.log --resume --bar run_simulation "$enval" "$flval" {} "$val" ::: {1..3}
    done
done

我的问题是我无法打印并行的进度条，并且不知道为什么。在当前工作目录中执行的简单并行命令确实会显示进度条。我在这里做错了什么？

Answer 1

尝试这样的事情：

parallel --jobs $JOBS_PER_NODE --joblog task.log --resume --bar run_simulation {2} {3} {1} "$val" ::: {1..3} ::: "${energy_list[@]}" ::: "${fluence_list[@]}"

Answer 2

后来想通了。进度条只会在计算节点显示，所以要查看进度条，应该使用 srun

使用 SLURM 脚本在 GNU Parallel 中不显示进度条

问题描述投票：0回答：2

2个回答

最新问题

使用 SLURM 脚本在 GNU Parallel 中不显示进度条

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2