我正在使用 selfAttentionLayer 构建多头自注意力深度学习网络。代码如下:
numKeyChannels = 32;
numHeads = 1; % or 2, 4, 8, 16 etc.
layers = [
sequenceInputLayer(numKeyChannels);
selfAttentionLayer(numHeads, numKeyChannels,"AttentionMask","causal")
];
dlNetwork = layerGraph(layers);
dlNetwork = dlnetwork(dlNetwork);
summary(dlNetwork);
无论头的数量是多少(numHeads, 1 or 2, 4, 8, 16),可学习参数的数量总是4.2K:
Number of learnable parameters: 4.2k
可学习参数的数量不应该随着attention heads的数量增加吗?
附言selfAttentionLayer 是最新版 MATLAB 2013a 中新引入的功能,用于多头自注意力深度学习。