在多个文件上运行 SLURM 的最简单方法

Question

我有一个 Python 脚本，可以逐个处理大约 10,000 个 FITS 文件。对于每个文件，脚本会在与输入文件相同的目录中生成输出，并创建单个 CSV 文件来记录有关已处理文件的统计信息。

以前，我使用异步与多处理池并行化脚本，但现在我可以访问 SLURM 集群并希望使用 SLURM 运行它。

实现这一目标的最简单方法是什么？所有文件都存储在同一目录中，并且没有特定的处理顺序。编辑：我还需要在运行 python 脚本之前激活 conda 环境。 Python 应该接受文件名并开始运行代码。通常我通过 args 发送文件名。谢谢

******** 编辑更新：我设法让它发挥作用。
首先，我创建了用于提交作业的 bash 脚本：

#!/bin/bash

# Define the directory containing FITS files
INPUT_DIR="input_dir"
LOG_DIR="${INPUT_DIR}/logs"

# Ensure the logs directory exists
mkdir -p "$LOG_DIR"

# List all FITS files and write their paths to a temporary file
find "$INPUT_DIR" -name "*.fits" > file_list.txt

# Loop through each FITS file and submit a SLURM job
while IFS= read -r filepath; do
    sbatch run2.sh "$filepath"
done < file_list.txt

因此，该脚本正在调用 run2.sh 脚本，其中包含以下内容：

#!/bin/bash
#SBATCH -p long
#SBATCH -J test
#SBATCH -n 1
#SBATCH -t 00:05:00
#SBATCH --output=file.out
#SBATCH --error=file.err


source miniconda3/bin/activate my_env

# Define variables
# EVENT_PATH="directory_path"

# Run Python script
python3 -u my_python_code.py "$1" "False" 3

我的下一个担忧是，通过这种方式，我将创造 10k 个工作岗位，因为我有 10k 个图像需要分析，尽管分析每个图像只需要几秒钟。也许有更聪明的方法来做到这一点。

Answer 1

我前段时间有类似的需求，下面是我用来解决它的脚本。您需要的是 SLURM 数组作业，其中每个作业都将获得自己的一组资源，并且可以在不同的文件上运行。

下面，我使用

$SLURM_ARRAY_TASK_ID

环境变量作为 python 参数 (

sys.argv[2]

) 来决定要操作哪个文件。它本质上是作业数组中作业的索引，如上面文档的链接中所定义。

%a

和

--job-name

中的

--output

也被此索引替换，以为每个作业/输出生成唯一的名称。您可以将其他参数传递给 slurm 脚本，然后传递给 python 脚本，例如

arg1

->

$1

->

sys.argv[1]

当然，您的核心/内存/时间要求会有所不同。

#!/bin/bash
# use as:
# sbatch --job-name=name_%a --output=out_%a.txt --array=1-nFiles testslurm.sh arg1
#-------------------------------------------------------------
#-------------------------------------------------------------
#
#
#Number of CPU cores to use within one node
#SBATCH -c 12
#
#Define the number of hours the job should run. 
#Maximum runtime is limited to 10 days, ie. 240 hours
#SBATCH --time=24:00:00
#
#Define the amount of RAM used by your job in GigaBytes
#In shared memory applications this is shared among multiple CPUs
#SBATCH --mem=64G
#Do not export the local environment to the compute nodes
#unset SLURM_EXPORT_ENV
#
#Set the number of threads to the SLURM internal variable
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
#
#load the respective software module you intend to use
#module load YourModuleHere
#
#run the respective binary through SLURM's srun
conda init bash
conda activate suite2p

srun --cpu_bind=verbose  python  batchfunc.py  ~/codes/data/$1 $SLURM_ARRAY_TASK_ID

在多个文件上运行 SLURM 的最简单方法

问题描述投票：0回答：1

1个回答

最新问题

在多个文件上运行 SLURM 的最简单方法

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1