为什么 selfAttentionLayer 的可学习参数数量与 MATLAB 中的头数无关？

问题描述投票：0回答：0

我正在使用 selfAttentionLayer 构建多头自注意力深度学习网络。代码如下：

numKeyChannels = 32;
numHeads = 1; % or 2, 4, 8, 16 etc.
layers = [
    sequenceInputLayer(numKeyChannels);
    selfAttentionLayer(numHeads, numKeyChannels,"AttentionMask","causal")
 ];
dlNetwork = layerGraph(layers);
dlNetwork = dlnetwork(dlNetwork);
summary(dlNetwork);

无论头的数量是多少(numHeads, 1 or 2, 4, 8, 16)，可学习参数的数量总是4.2K：

Number of learnable parameters: 4.2k

可学习参数的数量不应该随着attention heads的数量增加吗？

附言selfAttentionLayer 是最新版 MATLAB 2013a 中新引入的功能，用于多头自注意力深度学习。

matlab

deep-learning

self-attention

multihead-attention

为什么 selfAttentionLayer 的可学习参数数量与 MATLAB 中的头数无关？

问题描述 投票：0回答：0

最新问题

问题描述投票：0回答：0