我已经下载了Apple的truedepth拖缆示例,并试图添加一个计算管道。我想我正在检索计算结果,但不确定,因为它们似乎都为零。
我是iOS开发的初学者,所以可能会有很多错误,请多多包涵!
管道设置:(由于内核输出float3,我不太确定如何创建结果缓冲区)
int resultsCount = CVPixelBufferGetWidth(depthFrame) * CVPixelBufferGetHeight(depthFrame);
//because I will be output 3 floats for each value in depthframe
id<MTLBuffer> resultsBuffer = [self.device newBufferWithLength:(sizeof(float) * 3 * resultsCount) options:MTLResourceOptionCPUCacheModeDefault];
_threadgroupSize = MTLSizeMake(16, 16, 1);
// Calculate the number of rows and columns of threadgroups given the width of the input image
// Ensure that you cover the entire image (or more) so you process every pixel
_threadgroupCount.width = (inTexture.width + _threadgroupSize.width - 1) / _threadgroupSize.width;
_threadgroupCount.height = (inTexture.height + _threadgroupSize.height - 1) / _threadgroupSize.height;
// Since we're only dealing with a 2D data set, set depth to 1
_threadgroupCount.depth = 1;
id<MTLComputeCommandEncoder> computeEncoder = [commandBuffer computeCommandEncoder];
[computeEncoder setComputePipelineState:_computePipelineState];
[computeEncoder setTexture: inTexture atIndex:0];
[computeEncoder setBuffer:resultsBuffer offset:0 atIndex:1];
[computeEncoder setBytes:&intrinsics length:sizeof(intrinsics) atIndex:0];
[computeEncoder dispatchThreadgroups:_threadgroupCount
threadsPerThreadgroup:_threadgroupSize];
[computeEncoder endEncoding];
// Finalize rendering here & push the command buffer to the GPU
[commandBuffer commit];
//for testing
[commandBuffer waitUntilCompleted];
我添加了以下计算内核:
kernel void
calc(texture2d<float, access::read> inTexture [[texture(0)]],
device float3 *resultsBuffer [[buffer(1)]],
constant float3x3& cameraIntrinsics [[ buffer(0) ]],
uint2 gid [[thread_position_in_grid]])
{
float val = inTexture.read(gid).x * 1000.0f;
float xrw = (gid.x - cameraIntrinsics[2][0]) * val / cameraIntrinsics[0][0];
float yrw = (gid.y - cameraIntrinsics[2][1]) * val / cameraIntrinsics[1][1];
int vertex_id = ((gid.y * inTexture.get_width()) + gid.x);
resultsBuffer[vertex_id] = float3(xrw, yrw, val);
}
用于查看缓冲区结果的代码:(我尝试了两种不同的方式,并且目前都输出全零)
void *output = [resultsBuffer contents];
for (int i = 0; i < 10; ++i) {
NSLog(@"value is %f", *(float *)(output) ); //= *(float *)(output + 4 * i);
}
NSData *data = [NSData dataWithBytesNoCopy:resultsBuffer.contents length:(sizeof(float) * 3 * resultsCount)freeWhenDone:NO];
float *finalArray = new float [resultsCount * 3];
[data getBytes:&finalArray[0] length:sizeof(finalArray)];
for (int i = 0; i < 10; ++i) {
NSLog(@"here is output %f", finalArray[i]);
}
我在这里看到了几个问题,但它们都与您的金属代码本身无关。
在您写的第一个输出循环中,您仅将结果缓冲区的第一个元素打印10次。第一个元素可以合法地为0,使您相信所有结果均为零。但是当我将第一条日志行更改为
时NSLog(@"value is %f", ((float *)output)[i]);
在测试映像上运行内核时,我看到了不同的打印值。
另一个问题与您的getBytes:length:
通话有关。您想传递要复制的字节数,但是sizeof(finalArray)
实际上是finalArray
pointer]的大小,即4个字节,而不是它指向的缓冲区的总大小。这是C和C ++代码中极为常见的错误。
相反,您可以使用与分配空间时相同的字节数:
[data getBytes:&finalArray[0] length:(sizeof(float) * 3 * resultsCount)];
然后您应该发现打印出的值与上一步相同(非零)。