这是我关于堆栈溢出的第一篇文章,所以请耐心等待。
我目前正在编写 OpenCL 内核并需要使用内置 sqrt 函数。但是,要使其起作用,函数的参数必须是浮点型。我目前有一个整数值,需要将其转换为浮点数,以便执行 sqrt() 函数,然后将其转换回整数,以便可以将其存储到“magOut”数组中。
下面的代码应该可以让您更好地理解我正在尝试做的事情:
magOutput[workItemNum] = sqrt(xConv[workItemNum]*xConv[workItemNum] + yConv[workItemNum]*yConv[workItemNum]);
如果需要了解所需的应用程序,这里是完整的代码:
__kernel void matrixMultiplication(__global int* input, __global int* xConv, __global int* yConv, __global int* size, __global int* magOutput){
int workItemNum = get_global_id(0); //Work item ID
int workGroupNum = get_group_id(0); //Work group ID
int localGroupID = get_local_id(0); //Work items ID within each work group
// size refers to the total size of a matrix. So for a 3x3 size = 9
float dim = *size;
int dim1 = *size;
int row = sqrt(dim); // only square matrices are used and as such the sqrt of size produces the row length
int current_row = workItemNum/dim; // the current row is calculated by using the current workitem number divided by the total size of the matrix
int col = sqrt(dim); // only square matrices are used and as such the sqrt of size produces the column length
int current_col = workItemNum % dim1; // the current column is calculated by using the current workitem number modulus by the total size of the matrix
// printf("dimension: %i \n",localGroupID);
// This if statement excludes all boundary pixels from the calculation as you require the neighbouring pixel cells
// for this calculation
if (current_col == 0 || current_col == col-1 || current_row == 0 || current_row == row - 1){
/*===============================================================================================================
* The xConv array performs the kernal convultion of the input grey scale values with the following matrix:
*
* [-1 0 +1]
* X - Directional Kernel = [-2 0 +2]
* [-1 0 +1]
*
* This scans across the X direction of the image and enhances all edges in the X-direction
* ===============================================================================================================
*/
xConv[workItemNum] = input[(current_col - 1)*col + current_row - 1]*-1
+ input[(current_col)*col + current_row - 1]*0
+ input[(current_col + 1)*col + current_row - 1]*1
+ input[(current_col - 1)*col + current_row]*-2
+ input[(current_col)*col + current_row]*0
+ input[(current_col + 1)*col + current_row]*2
+ input[(current_col - 1)*col + current_row + 1]*-1
+ input[(current_col)*col + current_row + 1]*0
+ input[(current_col + 1)*col + current_row + 1]*1;
/*===============================================================================================================
* The xConv array performs the kernal convultion of the input grey scale values with the following matrix:
*
* [+1 +2 +1]
* Y - Directional Kernel = [ 0 0 0]
* [-1 -2 -1]
*
* This scans across the Y direction of the image and enhances all edges in the Y-direction
* ===============================================================================================================
*/
yConv[workItemNum] = input[(current_col - 1)*col + current_row - 1]*-1
+ input[(current_col)*col + current_row - 1]*-2
+ input[(current_col + 1)*col + current_row - 1]*-1
+ input[(current_col - 1)*col + current_row]*0
+ input[(current_col)*col + current_row]*0
+ input[(current_col + 1)*col + current_row]*0
+ input[(current_col - 1)*col + current_row + 1]*1
+ input[(current_col)*col + current_row + 1]*2
+ input[(current_col + 1)*col + current_row + 1]*1;
}
//===============================================================================================================
// Calculates the convolution matrix of the X and Y arrays. Does so by squaring each item of the X and Y arrays,
// adding them and taking the square root. This is the basic magnitude formula. This is done for by each workItem
//===============================================================================================================
magOutput[workItemNum] = sqrt(xConv[workItemNum]*xConv[workItemNum] + yConv[workItemNum]*yConv[workItemNum]);
}
有什么建议吗?
因此,您本质上有一个
int
数据类型的 2D 向量,并且想要计算其长度。大多数 OpenCL C 只是标准 C99 代码/语法,因此最直接的方法是使用标准 C 风格类型转换:
magOutput[workItemNum] = (int)(sqrt((float)xConv[workItemNum]*(float)xConv[workItemNum] + (float)yConv[workItemNum]*(float)yConv[workItemNum])+0.5f);
+0.5f
用于正确舍入:将 float
转换为 int
始终向下舍入,例如 (int)3.9f
将转换为 3
。通过在转换前立即添加 +0.5f
,结果可以正确向上/向下舍入。请注意,我首先转换为 float
,然后进行平方;否则乘法过程中可能会出现整数溢出。
一个可能更快的方法是这样的:这里我只从全局内存加载值
xConv[workItemNum]
/yConv[workItemNum]
一次(这真的很慢),将它们转换为float
并将它们存储在私有内存(寄存器)中xConvf
/yConvf
。然后我进行长度计算和舍入,然后将结果写回到慢速全局内存中的magOutput[workItemNum]
。
const float xConvf = (float)xConv[workItemNum], yConvf = (float)yConv[workItemNum];
magOutput[workItemNum] = (int)(sqrt(xConvf*xConvf + yConvf*yConvf)+0.5f);
如果您想真正喜欢 OpenCL C 中的内置数学功能,您也可以这样做(应该与第二种方法一样快):
magOutput[workItemNum] = (int)(length(float2((float)xConv[workItemNum], (float)yConv[workItemNum]))+0.5f);
所以有 2 个要点给你:
(float)x
。虽然这不会影响您的 OpenCL C 代码,但我推荐使用这个轻量级的 OpenCL-Wrapper 来使用 C++ 进行开发。这将 CPU 代码中的 OpenCL 控制逻辑减少到大约 1/4,并使开发变得更加容易。