Cuda 原子(CAS、Exch)循环挂起

问题描述 投票:0回答:1

我正在尝试实现 Cuda 阻塞,大致遵循这个 教程
我尝试实现该代码,但循环挂起。

类似地,我在内核中实现了这一点,其中互斥锁是在设备上分配的:

int state = 0;
int * mutex;
CudaMalloc(&mutex, sizeof(int) );
cudaMemcpy(mutex, &state , sizeof(int), cudaMemcpyHostToDevice);
    
mykernel<<< 1, 9>>>(mutex);
__global__ void mykernel(int * mutex){
int ti = threadIdx.x+blockDim.x*blockIdx.x;

printf("TESTATOM0 -- THREAD%d MUTEX:%d\n", ti, *mutex);
printf("TESTATOM1 -- THREAD%d MUTEX:%d  RETURN:%d\n", ti, *mutex, atomicCAS(mutex, 0, 1));
printf("TESTATOM2 -- THREAD%d MUTEX:%d\n", ti, *mutex);

atomicExch(mutex, 0);

printf("TESTATOM3 -- THREAD%d MUTEX:%d\n", ti, *mutex);
}

输出:

TESTATOM0 -- THREAD0 MUTEX:0
TESTATOM0 -- THREAD1 MUTEX:0
TESTATOM0 -- THREAD2 MUTEX:0
TESTATOM0 -- THREAD3 MUTEX:0
TESTATOM0 -- THREAD4 MUTEX:0
TESTATOM0 -- THREAD5 MUTEX:0
TESTATOM0 -- THREAD6 MUTEX:0
TESTATOM0 -- THREAD7 MUTEX:0
TESTATOM0 -- THREAD8 MUTEX:0
TESTATOM1 -- THREAD0 MUTEX:0  RETURN:0
TESTATOM1 -- THREAD1 MUTEX:0  RETURN:1
TESTATOM1 -- THREAD2 MUTEX:0  RETURN:1
TESTATOM1 -- THREAD3 MUTEX:0  RETURN:1
TESTATOM1 -- THREAD4 MUTEX:0  RETURN:1
TESTATOM1 -- THREAD5 MUTEX:0  RETURN:1
TESTATOM1 -- THREAD6 MUTEX:0  RETURN:1
TESTATOM1 -- THREAD7 MUTEX:0  RETURN:1
TESTATOM1 -- THREAD8 MUTEX:0  RETURN:1
TESTATOM2 -- THREAD0 MUTEX:1
TESTATOM2 -- THREAD1 MUTEX:1
TESTATOM2 -- THREAD2 MUTEX:1
TESTATOM2 -- THREAD3 MUTEX:1
TESTATOM2 -- THREAD4 MUTEX:1
TESTATOM2 -- THREAD5 MUTEX:1
TESTATOM2 -- THREAD6 MUTEX:1
TESTATOM2 -- THREAD7 MUTEX:1
TESTATOM2 -- THREAD8 MUTEX:1
TESTATOM3 -- THREAD0 MUTEX:0
TESTATOM3 -- THREAD1 MUTEX:0
TESTATOM3 -- THREAD2 MUTEX:0
TESTATOM3 -- THREAD3 MUTEX:0
TESTATOM3 -- THREAD4 MUTEX:0
TESTATOM3 -- THREAD5 MUTEX:0
TESTATOM3 -- THREAD6 MUTEX:0
TESTATOM3 -- THREAD7 MUTEX:0
TESTATOM3 -- THREAD8 MUTEX:0

注意第一个线程返回 0,其余线程返回 1
但是当我尝试在 while 阻塞循环中实现这个时

    printf("TESTATOM0 -- THREAD%d MUTEX:%d\n", ti, *mutex);
    while (atomicCAS(mutex, 0, 1) == 1);
    
    for(int i = 0; i < 5; i++){
        printf("BLOCKED -- THREAD%d MUTEX:%d\n", ti, *mutex);
    }
    
    atomicExch(mutex, 0);
    
    for(int i = 0; i < 5; i++){
        printf("UNBLOCKED -- THREAD%d MUTEX:%d\n", ti, *mutex);
    }

输出是:

TESTATOM0 -- THREAD0 MUTEX:0
TESTATOM0 -- THREAD1 MUTEX:0
TESTATOM0 -- THREAD2 MUTEX:0
TESTATOM0 -- THREAD3 MUTEX:0
TESTATOM0 -- THREAD4 MUTEX:0
TESTATOM0 -- THREAD5 MUTEX:0
TESTATOM0 -- THREAD6 MUTEX:0
TESTATOM0 -- THREAD7 MUTEX:0
TESTATOM0 -- THREAD8 MUTEX:0

程序然后挂起,我必须从命令行退出。

编译并运行为:

nvcc -o ppmer -ccbin "D:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.29.30133\bin\HostX86\x64\cl.exe" ppmwriter.cu > .\outputs\AA_CURRENT_ERROR.txt

ppmer > .\outputs\AA_CURRENT_OUTPUT.txt 

我正在使用采用 Max-Q 设计的 Nvidia GeForce GTX 1650

我认为循环没有终止,但我不知道为什么?

while-loop cuda atomic
1个回答
0
投票

根据paleonix的评论,解决方案是编译选项:

-arch=sm_75
© www.soinside.com 2019 - 2024. All rights reserved.