nn.Transformer src/tgt/memory masks 失效

问题描述 投票:0回答:1

nn.TransformerEncoderLayer 使用相同的 src 产生完全相同的输出,无论 src_key_padding_masksrc_mask 是什么。

同样,nn.TransformerDecoderLayer 输出

不受任何tgt_mask、memory_mask、tgt_key_padding_mask 或memory_key_padding_mask的影响

有人知道出了什么问题吗?我怎样才能使口罩以正确的方式工作?非常感谢。

import torch
import torch.nn as nn

encoder_layer=nn.TransformerEncoderLayer(d_model=6,nhead=2)
encoder_layer.eval()
src=torch.ones((4,3,6))
encoder_layer(src)
tensor([[[ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910],
         [ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910],
         [ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910]],

        [[ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910],
         [ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910],
         [ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910]],

        [[ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910],
         [ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910],
         [ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910]],

        [[ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910],
         [ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910],
         [ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910]]],
       grad_fn=<NativeLayerNormBackward0>)
print(encoder_layer(src,src_mask=torch.zeros((4,4)).bool()))
print(encoder_layer(src,src_mask=torch.tensor(
    [[0,1,1,1],
     [0,0,1,1],
     [0,0,0,1],
     [0,0,0,0]]
).bool()))
print(encoder_layer(src,src_mask=torch.tensor(
    [[0,1,0,1],
     [1,0,1,1],
     [0,1,0,1],
     [0,1,1,1]]
).bool()))
tensor([[[ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910],
         [ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910],
         [ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910]],

        [[ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910],
         [ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910],
         [ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910]],

        [[ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910],
         [ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910],
         [ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910]],

        [[ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910],
         [ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910],
         [ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910]]],
       grad_fn=<NativeLayerNormBackward0>)
tensor([[[ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910],
         [ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910],
         [ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910]],

        [[ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910],
         [ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910],
         [ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910]],

        [[ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910],
         [ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910],
         [ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910]],

        [[ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910],
         [ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910],
         [ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910]]],
       grad_fn=<NativeLayerNormBackward0>)
tensor([[[ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910],
         [ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910],
         [ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910]],

        [[ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910],
         [ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910],
         [ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910]],

        [[ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910],
         [ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910],
         [ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910]],

        [[ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910],
         [ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910],
         [ 0.9927,  0.2251, -1.5199,  1.2508, -1.0397,  0.0910]]],
       grad_fn=<NativeLayerNormBackward0>)

以上所有产生相同的结果!出了什么问题?

nlp torch transformer-model
1个回答
0
投票

问题解决了!这是一个特殊的例子,通过使输入填充所有的。尽管 attn_weights 不同,attn_weights.sum(dim=-1) 总是等于 1,并且不同的掩码不影响 value_matrix,它包含相等的线向量。因此当 dim - 1 sumed to 1 attn_weights 乘以 value_matrix 结果相同

© www.soinside.com 2019 - 2024. All rights reserved.