我正在尝试实现 Cuda 阻塞,大致遵循这个 教程
我尝试实现该代码,但循环挂起。
类似地,我在内核中实现了这一点,其中互斥锁是在设备上分配的:
int state = 0;
int * mutex;
CudaMalloc(&mutex, sizeof(int) );
cudaMemcpy(mutex, &state , sizeof(int), cudaMemcpyHostToDevice);
mykernel<<< 1, 9>>>(mutex);
__global__ void mykernel(int * mutex){
int ti = threadIdx.x+blockDim.x*blockIdx.x;
printf("TESTATOM0 -- THREAD%d MUTEX:%d\n", ti, *mutex);
printf("TESTATOM1 -- THREAD%d MUTEX:%d RETURN:%d\n", ti, *mutex, atomicCAS(mutex, 0, 1));
printf("TESTATOM2 -- THREAD%d MUTEX:%d\n", ti, *mutex);
atomicExch(mutex, 0);
printf("TESTATOM3 -- THREAD%d MUTEX:%d\n", ti, *mutex);
}
输出:
TESTATOM0 -- THREAD0 MUTEX:0
TESTATOM0 -- THREAD1 MUTEX:0
TESTATOM0 -- THREAD2 MUTEX:0
TESTATOM0 -- THREAD3 MUTEX:0
TESTATOM0 -- THREAD4 MUTEX:0
TESTATOM0 -- THREAD5 MUTEX:0
TESTATOM0 -- THREAD6 MUTEX:0
TESTATOM0 -- THREAD7 MUTEX:0
TESTATOM0 -- THREAD8 MUTEX:0
TESTATOM1 -- THREAD0 MUTEX:0 RETURN:0
TESTATOM1 -- THREAD1 MUTEX:0 RETURN:1
TESTATOM1 -- THREAD2 MUTEX:0 RETURN:1
TESTATOM1 -- THREAD3 MUTEX:0 RETURN:1
TESTATOM1 -- THREAD4 MUTEX:0 RETURN:1
TESTATOM1 -- THREAD5 MUTEX:0 RETURN:1
TESTATOM1 -- THREAD6 MUTEX:0 RETURN:1
TESTATOM1 -- THREAD7 MUTEX:0 RETURN:1
TESTATOM1 -- THREAD8 MUTEX:0 RETURN:1
TESTATOM2 -- THREAD0 MUTEX:1
TESTATOM2 -- THREAD1 MUTEX:1
TESTATOM2 -- THREAD2 MUTEX:1
TESTATOM2 -- THREAD3 MUTEX:1
TESTATOM2 -- THREAD4 MUTEX:1
TESTATOM2 -- THREAD5 MUTEX:1
TESTATOM2 -- THREAD6 MUTEX:1
TESTATOM2 -- THREAD7 MUTEX:1
TESTATOM2 -- THREAD8 MUTEX:1
TESTATOM3 -- THREAD0 MUTEX:0
TESTATOM3 -- THREAD1 MUTEX:0
TESTATOM3 -- THREAD2 MUTEX:0
TESTATOM3 -- THREAD3 MUTEX:0
TESTATOM3 -- THREAD4 MUTEX:0
TESTATOM3 -- THREAD5 MUTEX:0
TESTATOM3 -- THREAD6 MUTEX:0
TESTATOM3 -- THREAD7 MUTEX:0
TESTATOM3 -- THREAD8 MUTEX:0
注意第一个线程返回 0,其余线程返回 1
但是当我尝试在 while 阻塞循环中实现这个时
printf("TESTATOM0 -- THREAD%d MUTEX:%d\n", ti, *mutex);
while (atomicCAS(mutex, 0, 1) == 1);
for(int i = 0; i < 5; i++){
printf("BLOCKED -- THREAD%d MUTEX:%d\n", ti, *mutex);
}
atomicExch(mutex, 0);
for(int i = 0; i < 5; i++){
printf("UNBLOCKED -- THREAD%d MUTEX:%d\n", ti, *mutex);
}
输出是:
TESTATOM0 -- THREAD0 MUTEX:0
TESTATOM0 -- THREAD1 MUTEX:0
TESTATOM0 -- THREAD2 MUTEX:0
TESTATOM0 -- THREAD3 MUTEX:0
TESTATOM0 -- THREAD4 MUTEX:0
TESTATOM0 -- THREAD5 MUTEX:0
TESTATOM0 -- THREAD6 MUTEX:0
TESTATOM0 -- THREAD7 MUTEX:0
TESTATOM0 -- THREAD8 MUTEX:0
程序然后挂起,我必须从命令行退出。
编译并运行为:
nvcc -o ppmer -ccbin "D:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.29.30133\bin\HostX86\x64\cl.exe" ppmwriter.cu > .\outputs\AA_CURRENT_ERROR.txt
ppmer > .\outputs\AA_CURRENT_OUTPUT.txt
我正在使用采用 Max-Q 设计的 Nvidia GeForce GTX 1650
我认为循环没有终止,但我不知道为什么?
根据paleonix的评论,解决方案是编译选项:
-arch=sm_75