我正在开始使用 NVSHMEM,我想从一个简单的示例开始,但没有取得太大成功。
#include <nvshmem.h>
#include <stdio.h>
int main(int argc, char *argv[])
{
// Initialize the NVSHMEM library
nvshmem_init();
int mype = nvshmem_my_pe();
int npes = nvshmem_n_pes();
fprintf(stdout, "PE %d of %d has started ...\n", mype, npes);
// end shmem
nvshmem_finalize();
return 0;
}
使用以下 sbatch 文件运行:
#!/bin/bash -l
#SBATCH --nodes=2 # number of nodes
#SBATCH --ntasks=8 # number of tasks
#SBATCH --ntasks-per-node=4 # number of tasks per node
#SBATCH --gpus-per-task=1 # number of gpu per task
#SBATCH --cpus-per-task=1 # number of cores per task
#SBATCH --time=00:15:00 # time (HH:MM:SS)
#SBATCH --partition=gpu # partition
#SBATCH --account=p200301 # project account
#SBATCH --qos=default # SLURM qos
module load NCCL OpenMPI CUDA NVSHMEM && nvcc -rdc=true -ccbin g++ -I $NVSHMEM_HOME/include test.cu -o test -L $NVSHMEM_HOME/lib -lnvshmem_host -lnvshmem_device -lucs -lucp && srun -n 8 ./test
预期的输出类似于:
PE 0 of 8 has started ...
PE 1 of 8 has started ...
PE 2 of 8 has started ...
.....
我得到的输出是:
PE 0 of 1 has started ...
PE 0 of 1 has started ...
PE 0 of 1 has started ...
PE 0 of 1 has started ...
PE 0 of 1 has started ...
PE 0 of 1 has started ...
PE 0 of 1 has started ...
PE 0 of 1 has started ...
我认为我错过了一些重要但简单的东西,有人可以启发我吗?
你有没有弄清楚这一点?我遇到了完全相同的问题。