我编写了一个 WGSL 计算着色器,它输出给定输入作为参数的结果。
现在我需要使用不同的输入多次运行该着色器。所有计算着色器步骤每次都应该相同。我确实可以每次创建一个新管道并获得正确的结果,但执行速度非常慢,可能是由于创建新管道/初始化缓冲区中的数据等的所有开销。
如何多次使用预先创建的 WGSL 管道(在不同的输入上),而无需每次都创建新管道?
let adapter = await navigator.gpu.requestAdapter();
let device = await adapter.requestDevice();
let module = device.createShaderModule({code: `@group(0) @binding(0) var<storage, read_write> sample: array<u32, 720>;
@group(0) @binding(1) var<storage, read_write> table: array<array<u32, 720>>;
@group(0) @binding(2) var<storage, read_write> result: array<u32>;
@compute @workgroup_size(1,1,1) fn computeThis (@builtin(global_invocation_id) id: vec3<u32>)
{
var diff : u32 = 0;
for (var i : u32 = 0; i < 720; i++)
{
diff += (table[id.x][i] - sample[i])*(table[id.x][i] - sample[i]);
}
result[id.x] = diff;
}
`, });
let pipeline = device.createComputePipeline({layout: 'auto', compute: {module}});
let sampleBuffer = device.createBuffer({size: sample.byteLength, usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_SRC | GPUBufferUsage.COPY_DST});
let tableBuffer = device.createBuffer({size: table.byteLength, usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_SRC | GPUBufferUsage.COPY_DST});
let inputBuffer = device.createBuffer({size: input.byteLength, usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_SRC | GPUBufferUsage.COPY_DST});
let resultBuffer = device.createBuffer({size: input.byteLength, usage: GPUBufferUsage.MAP_READ | GPUBufferUsage.COPY_DST});
let bindGroup = device.createBindGroup({layout: pipeline.getBindGroupLayout(0), entries: [{binding: 0, resource: { buffer: sampleBuffer }},{binding: 1, resource: { buffer: tableBuffer }},{binding: 2, resource: { buffer: inputBuffer }}]});
let encoder = device.createCommandEncoder();
let pass = encoder.beginComputePass();
pass.setPipeline(pipeline);
pass.setBindGroup(0, bindGroup);
pass.dispatchWorkgroups(LEN,1,1);
pass.end();
encoder.copyBufferToBuffer(inputBuffer, 0, resultBuffer, 0, resultBuffer.size);
device.queue.writeBuffer(sampleBuffer, 0, sample);
device.queue.writeBuffer(tableBuffer, 0, table);
device.queue.writeBuffer(inputBuffer, 0, input);
device.queue.submit([encoder.finish()]);
await resultBuffer.mapAsync(GPUMapMode.READ);
let result = new Uint32Array(resultBuffer.getMappedRange().slice());
resultBuffer.unmap();
inputBuffer.unmap();
sampleBuffer.unmap();
tableBuffer.unmap();
如何多次使用预先创建的 WGSL 管道(在不同的输入上)
您创建不同的缓冲区和绑定组
let pass = encoder.beginComputePass();
pass.setPipeline(pipeline);
pass.setBindGroup(0, bindGroup);
pass.dispatchWorkgroups(LEN,1,1);
pass.setBindGroup(0, bindGroup2);
pass.dispatchWorkgroups(LEN,1,1);
pass.setBindGroup(0, bindGroup3);
pass.dispatchWorkgroups(LEN,1,1);
pass.end();
或者将新数据上传到同一缓冲区,然后再次运行您的过程。(尽管这会更慢)
注意:GPU 核心速度极慢
@workgroup_size(1,1,1)
。事实上,在这篇文章中,M1 Mac 上的单核比 JavaScript 慢 30 倍。 NVidia 2070 Super 上的单核比 AMD Ryzen 9 3900XT 上的 JavaScript 慢 19 倍
GPU 通过大规模并行化获得速度
所需披露:我是链接文章的贡献者