将数据从CPU传递到GPU，而不显式地将其作为参数传递

Question

是否可以将数据从CPU传递到GPU而不显式地将其作为参数传递？

我不想将其作为参数传递，主要是出于语法糖的原因 - 我需要传递大约 20 个常量参数，而且还因为我连续调用两个具有（几乎）相同参数的内核。

我想要类似的东西

__constant__ int* blah;

__global__ myKernel(...){
    ... i want to use blah inside ...
}

int main(){
    ...
    cudaMalloc(...allocate blah...)
    cudaMemcpy(copy my array from CPU to blah)

}

Answer 1

cudaMemcpyToSymbol似乎是您正在寻找的功能。它的工作原理与 cudaMemcpy 类似，但有一个额外的“偏移”参数，看起来它可以更轻松地跨 2D 数组进行复制。

（我犹豫是否提供代码，因为我无法测试它 - 但请参阅 this thread 和 this post 以供参考。）

Answer 2

使用

__device__

应用全局变量。和使用方法类似

__constant__

Answer 3

您可以采取一些方法。这取决于您将如何使用该数据。

如果您的模式访问是constant并且块内的线程读取相同的位置，请使用 __constant__ 内存来广播读取请求。
如果您的模式访问与给定位置的邻居相关，或者具有随机访问（不合并），那么我建议使用纹理内存
如果您需要读/写数据并知道数组的大小，请将其定义为内核中的 __device__ blah[size] 。

例如：

__constant__ int c_blah[65536]; // constant memory
__device__ int g_blah[1048576]; // global memory

__global__ myKernel() {
    // ... i want to use blah inside ...
    int idx = threadIdx.x + blockIdx.x * blockDim.x;
    // get data from constant memory
    int c = c_blah[idx];
    // get data from global memory
    int g = g_blah[idx];
    // get data from texture memory
    int t = tex1Dfetch(ref, idx);
    // operate
    g_blah[idx] = c + g + t;
}


int main() {
    // declare array in host
    int c_h_blah[65536]; // and initialize it as you want
    // copy from host to constant memory
    cudaMemcpyToSymbol(c_blah, c_h_blah, 65536*sizeof(int), 0, cudaMemcpyHostToDevice);
    // declare other array in host
    int g_h_blah[1048576]; // and initialize it as you want
    // declare one more array in host
    int t_h_blah[1048576]; // and initialize it as you want
    // declare a texture reference
    texture<int, 1, cudaReadModeElementType> tref;
    // bind the texture to the array
    cudaBindTexture(0,tref,t_h_blah, 1048576*sizeof(int));
    // call your kernel
    mykernel<<<dimGrid, dimBlock>>>();
    // copy result from GPU to CPU memory
    cudaMemcpy(g_h_blah, g_blah, 1048576*sizeof(int), cudaMemcpyDeviceToHost);
}

您可以在内核中使用三个数组，而无需向内核传递任何参数。请注意，这只是一个使用示例，而不是内存层次结构的优化使用，即：不建议以这种方式使用常量内存。

希望这有帮助。

Answer 4

使用“cudaMemcpyToSymbol”时要小心，如果您尝试将结构从 CPU 复制到 GPU，它可能会引入错误。

将数据从CPU传递到GPU，而不显式地将其作为参数传递

问题描述投票：0回答：4

4个回答

最新问题

将数据从CPU传递到GPU，而不显式地将其作为参数传递

问题描述 投票：0回答：4

4个回答

最新问题

问题描述投票：0回答：4