如何在内核中使用已经在host上动态分配的结构体数组,[不将结构体数组作为内核参数传递?在线文档数量众多,但不适用于以下程序。
注意:请注意,在发布此问题之前,已经研究了以下问题:
1)copying host memory to cuda __device__ variable 2)Global variable in CUDA 3)Is there any way to dynamically allocate constant memory? CUDA
到目前为止,未成功尝试:
cudaMalloc()
动态分配结构数组,然后cudaMemcpyToSymbol()
与从cudaMalloc()
返回的指针一起使用,以复制到内核可以使用的__device__
变量。尝试密码:
NBody.cu(使用cudaStatus进行错误检查通常已省略,以提高可读性,并删除了将文件中的数据读取到动态数组中的功能:]]
#include "cuda_runtime.h" #include "device_launch_parameters.h" #include <stdio.h> #include <stdlib.h> #define BLOCK 256 struct nbody { float x, y, vx, vy, m; }; typedef struct nbody nbody; // Global declarations nbody* particle; // Device variables __device__ unsigned int d_N; // Kernel can successfully access this __device__ nbody d_particle; // Update: part of problem was here with (*) // Aim of kernel: to print contents of array of structs without using kernel argument __global__ void step_cuda_v1() { int i = threadIdx.x + blockDim.x * blockIdx.x; if (i < d_N) { printf("%.f\n", d_particle.x); } } int main() { unsigned int N = 10; unsigned int I = 1; cudaMallocHost((void**)&particle, N * sizeof(nbody)); // Host allocation cudaError_t cudaStatus; for (int i = 0; i < N; i++) particle[i].x = i; nbody* particle_buf; // device buffer cudaSetDevice(0); cudaMalloc((void**)&particle_buf, N * sizeof(nbody)); // Allocate device mem cudaMemcpy(particle_buf, particle, N * sizeof(nbody), cudaMemcpyHostToDevice); // Copy data into device mem cudaMemcpyToSymbol(d_particle, &particle_buf, sizeof(nbody*)); // Copy pointer to data into __device__ var cudaMemcpyToSymbol(d_N, &N, sizeof(unsigned int)); // This works fine int NThreadBlock = (N + BLOCK - 1) / BLOCK; for (int iteration = 0; iteration <= I; iteration++) { step_cuda_v1 << <NThreadBlock, BLOCK >> > (); //step_cuda_v1 << <1, 5 >> > (particle_buf); cudaDeviceSynchronize(); cudaStatus = cudaGetLastError(); if (cudaStatus != cudaSuccess) { fprintf(stderr, "ERROR: %s\n", cudaGetErrorString(cudaStatus)); exit(-1); } } return 0; }
输出:
“错误:内核启动失败。”
摘要:
如何在内核中动态分配在主机上动态分配的结构数组,而不将结构数组作为内核参数传递?这似乎是一个带有...