我正在使用
tensorflow==2.16.1
构建一个变压器模型,其中一层是tensorflow.keras.layers.MultiHeadAttention层。
我在下面的 TransformerBlock 中实现了注意力层:
# Import TensorFlow and Keras for building and training neural network models
import tensorflow as tf
from tensorflow.keras.layers import (
Dense,
LayerNormalization,
MultiHeadAttention,
Dropout,
)
class TransformerBlock(tf.keras.layers.Layer):
def __init__(self, embed_dim, num_heads, ff_dim, rate=0.1, **kwargs):
super(TransformerBlock, self).__init__(**kwargs)
self.embed_dim = embed_dim
self.num_heads = num_heads
self.ff_dim = ff_dim
self.rate = rate
self.att = None
self.ffn = None
self.layernorm1 = None
self.layernorm2 = None
self.dropout1 = None
self.dropout2 = None
def build(self, input_shape):
self.att = MultiHeadAttention(num_heads=self.num_heads, key_dim=self.embed_dim)
self.ffn = tf.keras.Sequential(
[Dense(self.ff_dim, activation="relu"), Dense(self.embed_dim)]
)
self.layernorm1 = LayerNormalization(epsilon=1e-6)
self.layernorm2 = LayerNormalization(epsilon=1e-6)
self.dropout1 = Dropout(self.rate)
self.dropout2 = Dropout(self.rate)
super(TransformerBlock, self).build(input_shape)
def call(self, inputs, training, padding_mask=None, causal_mask=True, qa=False):
mask = None
seq_len = tf.shape(inputs)[1]
batch_size = tf.shape(inputs)[0]
if padding_mask is not None:
padding_mask_reshaped = tf.cast(
tf.reshape(padding_mask, (batch_size, 1, seq_len)), dtype=tf.float32
)
mask = tf.broadcast_to(
padding_mask_reshaped, (batch_size, seq_len, seq_len)
)
attn_output = self.att(
inputs, inputs, attention_mask=mask, use_causal_mask=True
)
attn_output = self.dropout1(attn_output, training=training)
out1 = self.layernorm1(inputs + attn_output)
ffn_output = self.ffn(out1)
ffn_output = self.dropout2(ffn_output, training=training)
out2 = self.layernorm2(out1 + ffn_output)
return out2
每当我实现此 TransformerBlock 时,我都会收到警告。
lib/python3.11/site-packages/keras/src/layers/layer.py:877: UserWarning: Layer 'value' (of type EinsumDense) was passed an input with a mask attached to it. However, this layer does not support masking and will therefore destroy the mask information. Downstream layers will not see the mask.
但是,当我通过
padding_mask
和 use_causal_mask=True
时,它会改变模型性能。例如,如果我通过了use_causal_mask=False
,则模型的表现会不切实际地表现得好,正如在没有因果掩码的情况下所预测的那样,这对我来说意味着因果掩码正在发挥作用。如果我创建 causal_mask 与 padding_mask 并将其合并并将其传递给 attention_mask
arg,则会观察到相同的行为。
当我在互联网上搜索为什么收到此警告时,有关它的信息很少。这里有人知道为什么我无法停止收到此警告及其含义吗?
我尝试使用
tensorFlow 2.16.1
和 python 3.10.12
在 google colab 中复制此代码,但没有注意到任何警告。我附上gist文件供您参考。
如果警告仍然存在,请告诉我们有关该问题的更多详细信息,例如平台、TensorFlow 和 python 版本详细信息,以便更好地了解该问题。