访问设备上动态分配的数组(不将其作为内核参数传递)

问题描述 投票:-2回答:1

如何在内核中使用已经在host上动态分配的结构体数组,[不将结构体数组作为内核参数传递?在线文档数量众多,但不适用于以下程序。

注意:请注意,在发布此问题之前,已经研究了以下问题:

1)copying host memory to cuda __device__ variable 2)Global variable in CUDA 3)Is there any way to dynamically allocate constant memory? CUDA

到目前为止,未成功尝试:

  1. cudaMalloc()动态分配结构数组,然后
  2. cudaMemcpyToSymbol()与从cudaMalloc()返回的指针一起使用,以复制到内核可以使用的__device__变量。

尝试密码:

NBody.cu(使用cudaStatus进行错误检查通常已省略,以提高可读性,并删除了将文件中的数据读取到动态数组中的功能:]]

#include "cuda_runtime.h"
#include "device_launch_parameters.h"

#include <stdio.h>
#include <stdlib.h>

#define BLOCK 256

struct nbody {
    float x, y, vx, vy, m;
};
typedef struct nbody nbody;

// Global declarations
nbody* particle;

// Device variables
__device__ unsigned int d_N;  // Kernel can successfully access this
__device__ nbody d_particle;  // Update: part of problem was here with (*)

// Aim of kernel: to print contents of array of structs without using kernel argument
__global__ void step_cuda_v1() {
    int i = threadIdx.x + blockDim.x * blockIdx.x;

    if (i < d_N) {
        printf("%.f\n", d_particle.x);
    }
}

int main() {
    unsigned int N = 10;
    unsigned int I = 1;

    cudaMallocHost((void**)&particle, N * sizeof(nbody)); // Host allocation

    cudaError_t cudaStatus;
    for (int i = 0; i < N; i++) particle[i].x = i;

    nbody* particle_buf; // device buffer
    cudaSetDevice(0);

    cudaMalloc((void**)&particle_buf, N * sizeof(nbody)); // Allocate device mem
    cudaMemcpy(particle_buf, particle, N * sizeof(nbody), cudaMemcpyHostToDevice); // Copy data into device mem
    cudaMemcpyToSymbol(d_particle, &particle_buf, sizeof(nbody*)); // Copy pointer to data into __device__ var
    cudaMemcpyToSymbol(d_N, &N, sizeof(unsigned int)); // This works fine

    int NThreadBlock = (N + BLOCK - 1) / BLOCK;
    for (int iteration = 0; iteration <= I; iteration++) {

        step_cuda_v1 << <NThreadBlock, BLOCK >> > ();
        //step_cuda_v1 << <1, 5 >> > (particle_buf);
        cudaDeviceSynchronize();
        cudaStatus = cudaGetLastError();
        if (cudaStatus != cudaSuccess)
        {
            fprintf(stderr, "ERROR: %s\n", cudaGetErrorString(cudaStatus));
            exit(-1);
        }
    }
    return 0;
}

输出:

“错误:内核启动失败。”

摘要:

  • 如何从内核打印结构数组的内容,而又不将其作为内核参数传递?
  • 使用带有CUDA 10.2的VS2019在C中进行编码

如何在内核中动态分配在主机上动态分配的结构数组,而不将结构数组作为内核参数传递?这似乎是一个带有...

memory-management cuda gpu nvidia
1个回答
0
投票
© www.soinside.com 2019 - 2024. All rights reserved.