为什么 selfAttentionLayer 的可学习参数数量与 MATLAB 中的头数无关?

问题描述 投票:0回答:0

我正在使用 selfAttentionLayer 构建多头自注意力深度学习网络。代码如下:

numKeyChannels = 32;
numHeads = 1; % or 2, 4, 8, 16 etc.
layers = [
    sequenceInputLayer(numKeyChannels);
    selfAttentionLayer(numHeads, numKeyChannels,"AttentionMask","causal")
 ];
dlNetwork = layerGraph(layers);
dlNetwork = dlnetwork(dlNetwork);
summary(dlNetwork);

无论头的数量是多少(numHeads, 1 or 2, 4, 8, 16),可学习参数的数量总是4.2K:

Number of learnable parameters: 4.2k

可学习参数的数量不应该随着attention heads的数量增加吗?

附言selfAttentionLayer 是最新版 MATLAB 2013a 中新引入的功能,用于多头自注意力深度学习。

matlab deep-learning self-attention multihead-attention
© www.soinside.com 2019 - 2024. All rights reserved.