如何在金属着色器中执行 2 个向量的外积？

Question

所以我正在开发一个神经网络在 iOS 中的 GPU 上运行，所以我需要使用矩阵符号（为了反向传播错误）能够执行 2 个向量的外积。


// Outer product of vector A and Vector B
kernel void outerProduct(const device float *inVectorA [[ buffer(0) ]],
                         const device float *inVectorB [[ buffer(1) ]],
                         device float *outVector [[ buffer(2) ]],
                         uint id [[ thread_position_in_grid ]]) {
    
    outVector[id] = inVectorA[id] * inVectorB[***?***]; // How to find this position on the thread group (or grid)?
}

Answer 1

您错误地使用了

thread_position_in_grid

。如果调度的是2D网格，应该是

uint2

或者

ushort2

，否则只会得到

坐标。参考Metal Shading Language specification中的表5.7。

我不确定我们在谈论哪个外积，但我认为输出应该是一个矩阵。如果您线性存储它，那么计算

outVector

的代码应该如下所示：

kernel void outerProduct(const device float *inVectorA [[ buffer(0) ]],
                         const device float *inVectorB [[ buffer(1) ]],
                         uint2 gridSize [[ threads_per_grid ]],
                         device float *outVector [[ buffer(2) ]],
                         uint2 id [[ thread_position_in_grid ]]) {
    
    outVector[id.y * gridSize.x + id.x] = inVectorA[id.x] * inVectorB[id.y];
}

此外，如果您要调度的网格大小正好为

inVectorA

x

inVectorB

，您可以在内核参数上使用属性

threads_per_grid

来找出网格有多大。

或者，您可以将向量的大小与向量本身一起传递。

Answer 2

得知 Metal 没有二维叉积（又名内积），我很惊讶，所以它是

float cross( float2 A, float2 B )
{
    float2 C = A.xy * B.yx;  // <- note B's swizzle
    return C.x - C.y;
}

所以回答你的问题：

float X = cross( inVectorA.read( id ), inVectorB.read( id ) );
outVector.write( X, id );

如何在金属着色器中执行 2 个向量的外积？

问题描述投票：0回答：2

2个回答

最新问题

如何在金属着色器中执行 2 个向量的外积？

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2