tensorflow.keras.layers.MultiHeadAttention 警告查询层正在破坏遮罩

问题描述 投票:0回答:1

我正在使用

tensorflow==2.16.1
构建一个变压器模型,其中一层是tensorflow.keras.layers.MultiHeadAttention层。

我在下面的 TransformerBlock 中实现了注意力层:

# Import TensorFlow and Keras for building and training neural network models
import tensorflow as tf
from tensorflow.keras.layers import (
    Dense,
    LayerNormalization,
    MultiHeadAttention,
    Dropout,
)

class TransformerBlock(tf.keras.layers.Layer):
    def __init__(self, embed_dim, num_heads, ff_dim, rate=0.1, **kwargs):

        super(TransformerBlock, self).__init__(**kwargs)

        self.embed_dim = embed_dim
        self.num_heads = num_heads
        self.ff_dim = ff_dim
        self.rate = rate

        
        self.att = None  
        self.ffn = None  

        self.layernorm1 = None 
        self.layernorm2 = None 

        self.dropout1 = None
        self.dropout2 = None

    def build(self, input_shape):

        self.att = MultiHeadAttention(num_heads=self.num_heads, key_dim=self.embed_dim)
        
        self.ffn = tf.keras.Sequential(
            [Dense(self.ff_dim, activation="relu"), Dense(self.embed_dim)]
        )

        self.layernorm1 = LayerNormalization(epsilon=1e-6)
        self.layernorm2 = LayerNormalization(epsilon=1e-6)

        self.dropout1 = Dropout(self.rate)
        self.dropout2 = Dropout(self.rate)

        super(TransformerBlock, self).build(input_shape)

    def call(self, inputs, training, padding_mask=None, causal_mask=True, qa=False):

        mask = None

        seq_len = tf.shape(inputs)[1]
        batch_size = tf.shape(inputs)[0]

        if padding_mask is not None:
            padding_mask_reshaped = tf.cast(
                tf.reshape(padding_mask, (batch_size, 1, seq_len)), dtype=tf.float32
            )
            mask = tf.broadcast_to(
                padding_mask_reshaped, (batch_size, seq_len, seq_len)
            )


        attn_output = self.att(
            inputs, inputs, attention_mask=mask, use_causal_mask=True
        )

        attn_output = self.dropout1(attn_output, training=training)

        out1 = self.layernorm1(inputs + attn_output)

        ffn_output = self.ffn(out1)

        ffn_output = self.dropout2(ffn_output, training=training)

        out2 = self.layernorm2(out1 + ffn_output)

        return out2
        

每当我实现此 TransformerBlock 时,我都会收到警告。

lib/python3.11/site-packages/keras/src/layers/layer.py:877: UserWarning: Layer 'value' (of type EinsumDense) was passed an input with a mask attached to it. However, this layer does not support masking and will therefore destroy the mask information. Downstream layers will not see the mask.

但是,当我通过

padding_mask
use_causal_mask=True
时,它会改变模型性能。例如,如果我通过了
use_causal_mask=False
,则模型的表现会不切实际地表现得好,正如在没有因果掩码的情况下所预测的那样,这对我来说意味着因果掩码正在发挥作用。如果我创建 causal_mask 与 padding_mask 并将其合并并将其传递给
attention_mask
arg,则会观察到相同的行为。

当我在互联网上搜索为什么收到此警告时,有关它的信息很少。这里有人知道为什么我无法停止收到此警告及其含义吗?

tensorflow keras deep-learning transformer-model multihead-attention
1个回答
0
投票

我尝试使用

tensorFlow 2.16.1
python 3.10.12
在 google colab 中复制此代码,但没有注意到任何警告。我附上gist文件供您参考。

如果警告仍然存在,请告诉我们有关该问题的更多详细信息,例如平台、TensorFlow 和 python 版本详细信息,以便更好地了解该问题。

© www.soinside.com 2019 - 2024. All rights reserved.