如何使用 slurm 运行 NVSHMEM

Question

我正在开始使用 NVSHMEM，我想从一个简单的示例开始，但没有取得太大成功。

#include <nvshmem.h>
#include <stdio.h>

int main(int argc, char *argv[])
{
    // Initialize the NVSHMEM library
    nvshmem_init();

    int mype = nvshmem_my_pe();
    int npes = nvshmem_n_pes();

    fprintf(stdout, "PE %d of %d has started ...\n", mype, npes);

    // end shmem
    nvshmem_finalize();

    return 0;
}

使用以下 sbatch 文件运行：

#!/bin/bash -l
#SBATCH --nodes=2                          # number of nodes
#SBATCH --ntasks=8                         # number of tasks
#SBATCH --ntasks-per-node=4                # number of tasks per node
#SBATCH --gpus-per-task=1                  # number of gpu per task
#SBATCH --cpus-per-task=1                  # number of cores per task
#SBATCH --time=00:15:00                    # time (HH:MM:SS)
#SBATCH --partition=gpu                    # partition
#SBATCH --account=p200301                  # project account
#SBATCH --qos=default                      # SLURM qos

module load NCCL OpenMPI CUDA NVSHMEM && nvcc -rdc=true -ccbin g++ -I $NVSHMEM_HOME/include test.cu -o test -L $NVSHMEM_HOME/lib -lnvshmem_host -lnvshmem_device -lucs -lucp && srun -n 8 ./test

预期的输出类似于：

PE 0 of 8 has started ...
PE 1 of 8 has started ...
PE 2 of 8 has started ...
.....

我得到的输出是：

PE 0 of 1 has started ...
PE 0 of 1 has started ...
PE 0 of 1 has started ...
PE 0 of 1 has started ...
PE 0 of 1 has started ...
PE 0 of 1 has started ...
PE 0 of 1 has started ...
PE 0 of 1 has started ...

我认为我错过了一些重要但简单的东西，有人可以启发我吗？

Answer 1

你有没有弄清楚这一点？我遇到了完全相同的问题。

如何使用 slurm 运行 NVSHMEM

问题描述投票：0回答：1

1个回答

最新问题

如何使用 slurm 运行 NVSHMEM

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1