使用 SLURM 脚本在 GNU Parallel 中不显示进度条

问题描述 投票:0回答:2

我是 GNU Parallel 的新手,正在尝试运行一些模拟。我有一个 bash 脚本,正在通过 SLURM 提交到集群。下面给出了脚本。本质上,并行调用一个函数 run_simulation,该函数将调用其中的 bash 脚本。 bash 脚本在当前目录中生成输出,每个作业的输出都不同。

#!/bin/bash
# Job name:
#SBATCH --job-name=Run_MD_Sim
#
# Account:
#SBATCH --account=fc_mllam
#
# Partition:
#SBATCH --partition=savio3
#
# Request one node:
#SBATCH --nodes=1
#
# Specify number of tasks for use case (example):
#
#
# Processors per task:
#SBATCH --cpus-per-task=2
#
# Wall clock limit:
#SBATCH --time=5:00:30
#
## Command(s) to run (example):

module load intel
module load openmpi
module load gcc
module load cmake
module load gnu-parallel/2019.03.22

energy_list=("90")
fluence_list=("1000")

len_energy=${#energy_list[@]}
len_fluence=${#fluence_list[@]}

# Change this line if number of nodes requested is changed
val="ALE_Cycle_Run_2.sh"

# Function to run MD simulation for a single combination of energy and fluence
run_simulation() {
    enval="$1"
    flval="$2"
    counter="$3"
    val="$4"
    
    # Create a directory to carry out computations. If node=1, then we are in fcmd_bondorder
    mkdir "Temp_Directory_$counter"
    
    # Check if using more than one node. If more than one node is used, then working directory will be the home directory. Below lines will change
    cp ../temp_000588-322.cfg "Temp_Directory_$counter/temp_000000-000.cfg"

    # Copy simulation files into this folder
    cp *.o "Temp_Directory_$counter/"
    cp *.cpp "Temp_Directory_$counter/"
    cp *.h "Temp_Directory_$counter/"
    cp Makefile "Temp_Directory_$counter/"
    cp "$val" "Temp_Directory_$counter/"
    cp Bond_Param_Gen.sh "Temp_Directory_$counter/Bond_Param_Gen.sh"

    # Change directory to temporary directory
    cd "Temp_Directory_$counter"
    
    # Run the main MD simulation. The output will be stored in the current directory
    bash "$val" "$enval" "$flval"

    # Make directory to store the bond-order files
    mkdir Data/
    mv *.txt Data/
    rm Data/*.txt
    bash Bond_Param_Gen.sh
    mv *.txt Data/
    mv *.cfg Data/

    # Home directory or scratch directory
    directory="/global/home/users/shoubhaniknath"
    new_filename="Data ${flval} impacts energy ${enval} number ${counter}"

    # Rename and move the data folder
    mv "Data" "$directory/$new_filename"
}

# Export the function so that GNU Parallel can access it
export -f run_simulation

# Set number of jobs based on number of cores available and number of threads per core
export JOBS_PER_NODE=$(( $SLURM_CPUS_ON_NODE / $SLURM_CPUS_PER_TASK ))

# Run simulations in parallel
for enval in "${energy_list[@]}"; do
    for flval in "${fluence_list[@]}"; do
        # Use GNU Parallel to parallelize the loop over 'counter'
    # Use below line for multiple nodes
    #  parallel --dry-run --jobs $JOBS_PER_NODE --slf hostfile run_simulation "$enval" "$flval" {} "$val" ::: {1..3}
    # For single node, use below line
    echo $JOBS_PER_NODE
    parallel --jobs $JOBS_PER_NODE --joblog task.log --resume --bar run_simulation "$enval" "$flval" {} "$val" ::: {1..3}
    done
done

我的问题是我无法打印并行的进度条,并且不知道为什么。在当前工作目录中执行的简单并行命令确实会显示进度条。我在这里做错了什么?

progress-bar gnu-parallel
2个回答
0
投票

尝试这样的事情:

parallel --jobs $JOBS_PER_NODE --joblog task.log --resume --bar run_simulation {2} {3} {1} "$val" ::: {1..3} ::: "${energy_list[@]}" ::: "${fluence_list[@]}"

0
投票

后来想通了。进度条只会在计算节点显示,所以要查看进度条,应该使用 srun

© www.soinside.com 2019 - 2024. All rights reserved.