我正在YouTube上使用此CUDA video tutorial。视频的后半部分提供了代码。这是一个简单的CUDA程序,用于添加两个数组的元素。因此,如果我们有一个名为a
的第一个数组和一个名为b
的第二个数组,则a[i]
的最终值为:
a[i] += b[i];
问题是,无论我做什么。最终输出的前四个元素始终是奇数。程序为0到1000的数组创建随机输入。这意味着每个索引的最终输出值应在0到2000之间。但是,不管随机种子是什么,程序总是输出一个非常大的组合(范围)前四个结果的数字或零。
对于大于3的索引,似乎可以找到输出。这是我的代码:
#include <iostream>
#include <cuda.h>
#include <stdlib.h>
#include <ctime>
using namespace std;
__global__ void AddInts( int *a, int *b, int count){
int id = blockIdx.x * blockDim.x +threadIdx.x;
if (id < count){
a[id] += b[id];
}
}
int main(){
srand(time(NULL));
int count = 100;
int *h_a = new int[count];
int *h_b = new int[count];
for (int i = 0; i < count; i++){ // Populating array with 100 random values
h_a[i] = rand() % 1000; // elements in range 0 to 1000
h_b[i] = rand() % 1000;
}
cout << "Prior to addition:" << endl;
for (int i =0; i < 10; i++){ // Print out the first five of each
cout << h_a[i] << " " << h_b[i] << endl;
}
int *d_a, *d_b; //device copies of those arrays
if(cudaMalloc(&d_a, sizeof(int) * count) != cudaSuccess) // malloc for cudaMemcpyDeviceToHost
{
cout<<"Nope!";
return -1;
}
if(cudaMalloc(&d_b, sizeof(int) * count) != cudaSuccess)
{
cout<<"Nope!";
cudaFree(d_a);
return -2;
}
if(cudaMemcpy(d_a, h_a, sizeof(int) * count, cudaMemcpyHostToDevice) != cudaSuccess)
{
cout << "Could not copy!" << endl;
cudaFree(d_a);
cudaFree(d_b);
return -3;
}
if(cudaMemcpy(d_b, h_b, sizeof(int) * count, cudaMemcpyHostToDevice) != cudaSuccess)
{
cout << "Could not copy!" << endl;
cudaFree(d_b);
cudaFree(d_a);
return -4;
}
AddInts<<<count / 256 +1, 256>>>(d_a, d_b, count);
if(cudaMemcpy(h_a, d_a, sizeof(int) * count, cudaMemcpyDeviceToHost)!= cudaSuccess) //magic of int division
{ // copy from device back to host
delete[]h_a;
delete[]h_b;
cudaFree(d_a);
cudaFree(d_b);
cout << "Error: Copy data back to host failed" << endl;
return -5;
}
delete[]h_a;
delete[]h_b;
cudaFree(d_a);
cudaFree(d_b);
for(int i = 0; i < 10; i++){
cout<< "It's " << h_a[i] << endl;
}
return 0;
}
我编译为:
nvcc threads_blocks_grids.cu -o threads
nvcc -version
的结果是:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17
这是我的输出:
Prior to addition:
771 177
312 257
303 5
291 819
735 359
538 404
718 300
540 943
598 456
619 180
It's 42984048
It's 0
It's 42992112
It's 0
It's 1094
It's 942
It's 1018
It's 1483
It's 1054
It's 799
打印前删除了主机阵列。那是undefined behavior。如果向上移动打印部件,则应解决。