nn.TransformerEncoderLayer 使用相同的 src 产生完全相同的输出,无论 src_key_padding_mask 或 src_mask 是什么。
同样,nn.TransformerDecoderLayer 输出不受任何tgt_mask、memory_mask、tgt_key_padding_mask 或memory_key_padding_mask的影响。
有人知道出了什么问题吗?我怎样才能使口罩以正确的方式工作?非常感谢。
import torch
import torch.nn as nn
encoder_layer=nn.TransformerEncoderLayer(d_model=6,nhead=2)
encoder_layer.eval()
src=torch.ones((4,3,6))
encoder_layer(src)
tensor([[[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910]],
[[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910]],
[[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910]],
[[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910]]],
grad_fn=<NativeLayerNormBackward0>)
print(encoder_layer(src,src_mask=torch.zeros((4,4)).bool()))
print(encoder_layer(src,src_mask=torch.tensor(
[[0,1,1,1],
[0,0,1,1],
[0,0,0,1],
[0,0,0,0]]
).bool()))
print(encoder_layer(src,src_mask=torch.tensor(
[[0,1,0,1],
[1,0,1,1],
[0,1,0,1],
[0,1,1,1]]
).bool()))
tensor([[[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910]],
[[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910]],
[[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910]],
[[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910]]],
grad_fn=<NativeLayerNormBackward0>)
tensor([[[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910]],
[[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910]],
[[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910]],
[[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910]]],
grad_fn=<NativeLayerNormBackward0>)
tensor([[[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910]],
[[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910]],
[[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910]],
[[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910]]],
grad_fn=<NativeLayerNormBackward0>)
以上所有产生相同的结果!出了什么问题?
问题解决了!这是一个特殊的例子,通过使输入填充所有的。尽管 attn_weights 不同,attn_weights.sum(dim=-1) 总是等于 1,并且不同的掩码不影响 value_matrix,它包含相等的线向量。因此当 dim - 1 sumed to 1 attn_weights 乘以 value_matrix 结果相同