CUDA错误中有很多全局变量

问题描述 投票:2回答:1
__device__ static char Tc0[] = {'0','\0'};
__device__ static char Tc1000[] = {'1','0','0','0','\0'};
__device__ static char Tc1000th[] = {'1','0','0','0','t','h','\0'};
__device__ static char Tc100[] = {'1','0','0','\0'};
__device__ static char Tc100th[] = {'1','0','0','t','h','\0'};

接下来有20000多条相似的线路..

__device__ static char Tczymolytic[] = {'z','y','m','o','l','y','t','i','c','\0'};
__device__ static char Tczymotic[] = {'z','y','m','o','t','i','c','\0'};

int main()
{
}

编译:

nvcc ./test2.cu

除了大量未使用变量的警告消息外,还出现以下错误:

ptxas error   : File uses too much global constant data (0x29e58 bytes, 0x10000 max)

CUDA使用恒定内存是什么?有可能解决它吗?

正如@talonmies指定的那样,使用以下编译命令它可以工作:

nvcc -w -std=c++11 -arch=sm_52 -cubin ./test2.cu

这里的关键选择是-arch=sm_52

cuda
1个回答
4
投票

一般来说。你在做什么是合法的,应该工作。

但是,似乎在现已弃用的Fermi架构(sm_20和sm_21)上,汇编器将尝试将静态定义和初始化的设备变量的初始化值填充到常量内存中,该内存具有64kb的大小限制。在较新的,受支持的体系结构上,这种情况不会发生。

因为您使用的是CUDA 7.5,其默认编译目标是sm_20,如果您没有指定汇编程序将静态设备声明发送到全局内存的体系结构,则一旦这些符号的大小超过64kb,编译将失败。

举个例子:

$ cat make_silly.py
for i in range(0,100000):
    print "__device__ static char tx%05d[] = {'0','1','2','3','5','6','7','8'};"%i

print ""
print "int main() { return 0; }"

$ python make_silly.py > make_silly.cu

$ tail -20 make_silly.cu
__device__ static char tx99982[] = {'0','1','2','3','5','6','7','8'};
__device__ static char tx99983[] = {'0','1','2','3','5','6','7','8'};
__device__ static char tx99984[] = {'0','1','2','3','5','6','7','8'};
__device__ static char tx99985[] = {'0','1','2','3','5','6','7','8'};
__device__ static char tx99986[] = {'0','1','2','3','5','6','7','8'};
__device__ static char tx99987[] = {'0','1','2','3','5','6','7','8'};
__device__ static char tx99988[] = {'0','1','2','3','5','6','7','8'};
__device__ static char tx99989[] = {'0','1','2','3','5','6','7','8'};
__device__ static char tx99990[] = {'0','1','2','3','5','6','7','8'};
__device__ static char tx99991[] = {'0','1','2','3','5','6','7','8'};
__device__ static char tx99992[] = {'0','1','2','3','5','6','7','8'};
__device__ static char tx99993[] = {'0','1','2','3','5','6','7','8'};
__device__ static char tx99994[] = {'0','1','2','3','5','6','7','8'};
__device__ static char tx99995[] = {'0','1','2','3','5','6','7','8'};
__device__ static char tx99996[] = {'0','1','2','3','5','6','7','8'};
__device__ static char tx99997[] = {'0','1','2','3','5','6','7','8'};
__device__ static char tx99998[] = {'0','1','2','3','5','6','7','8'};
__device__ static char tx99999[] = {'0','1','2','3','5','6','7','8'};

int main() { return 0; }

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17

$ nvcc -w -std=c++11 -arch=sm_30 -Xptxas="-v --disable-optimizer-constants" -cubin make_silly.cu 
ptxas info    : 800000 bytes gmem

$ nvcc -w -std=c++11 -arch=sm_20 -Xptxas="-v --disable-optimizer-constants" -cubin make_silly.cu 
ptxas error   : File uses too much global constant data (0xc3500 bytes, 0x10000 max)
ptxas info    : 800000 bytes gmem, 800000 bytes cmem[14]

在这里,您可以看到编译仅针对compute 2.x目标失败。对于更高的计算能力目标,汇编程序愉快地发出800kb的静态全局内存符号。

© www.soinside.com 2019 - 2024. All rights reserved.