Keras MultiHeadAttention TypeError when trying to replicate Transformer model: Where is Type 'None' coming from?

问题描述 投票:0回答:0

我正在尝试使用 Tensorflow 复制一个简单的 Transformer 模型。 我的数据集 (flat_xtrain) 是一组不同幅度的时间序列信号。这是我的预处理:

dataset = tensorflow.data.Dataset.from_tensor_slices(flat_xtrain[:num_train_samples])
n_steps = 512
window_length = n_steps + 1 # target = input shifted 1 character ahead
dataset = dataset.window(window_length, shift=1, drop_remainder=True)
dataset = dataset.flat_map(lambda window: window.batch(window_length))
batch_size = 32
dataset = dataset.shuffle(10000).batch(batch_size)
dataset = dataset.map(lambda windows: (windows[:, :-1], windows[:, 1:]))
dataset = dataset.prefetch(1)

下面是我的模型:

embed_dim = 512
dense_dim = 2048
num_heads = 2
max_steps = 500 
decoder_inputs = layers.Input(shape=(None, ), dtype=tensorflow.float16)
positional_encoding = PositionalEncoding(max_steps, max_dims=embed_dim) #This is from Aurelien Geron's book
decoder_in = positional_encoding(decoder_inputs)
mha_1 = layers.MultiHeadAttention(num_heads=num_heads, key_dim=embed_dim)
mha_2 = layers.MultiHeadAttention(num_heads=num_heads, key_dim=embed_dim)
layernorm_1 = layers.LayerNormalization()
layernorm_2 = layers.LayerNormalization()
layernorm_3 = layers.LayerNormalization()

Z = decoder_in
for N in range(6):
    Z = mha_1(query=Z, value=Z, key=Z, use_causal_mask=True)
    Z = layernorm_1(Z + decoder_in)
    skip_2 = Z
    Z = mha_2(query=Z, value=decoder_in, key=decoder_in)
    Z = layernorm_2(Z + skip_2)
    skip_3 = Z
    Z = layers.TimeDistributed(layers.Dense(dense_dim, activation="relu"))(Z) 
    Z = layers.TimeDistributed(layers.Dense(embed_dim))(Z) 
    Z = layernorm_3(Z + skip_3)
outputs = layers.TimeDistributed(keras.layers.Dense(1))(Z)
model = keras.Model(decoder_inputs, outputs)

下面是我得到的错误的回溯......不确定为什么在这一步出现错误。我几个月前才开始研究这个,我将不胜感激任何帮助或想法..

--> 177     Z = mha_1(query=Z, value=Z, key=Z, use_causal_mask=True)
    178     Z = layernorm_1(Z + decoder_in)
    179     skip_2 = Z

1 frames

/usr/local/lib/python3.10/dist-packages/keras/initializers/initializers.py in _compute_fans(shape)
   1145         receptive_field_size = 1
   1146         for dim in shape[:-2]:
-> 1147             receptive_field_size *= dim
   1148         fan_in = shape[-2] * receptive_field_size
   1149         fan_out = shape[-1] * receptive_field_size

TypeError: Exception encountered when calling layer "multi_head_attention" (type MultiHeadAttention).

unsupported operand type(s) for *=: 'int' and 'NoneType'

Call arguments received by layer "multi_head_attention" (type MultiHeadAttention):
  • query=tf.Tensor(shape=(1, None, None), dtype=float16)
  • value=tf.Tensor(shape=(1, None, None), dtype=float16)
  • key=tf.Tensor(shape=(1, None, None), dtype=float16)
  • attention_mask=None
  • return_attention_scores=False
  • training=None
  • use_causal_mask=True
tensorflow keras deep-learning tensorflow2.0 transformer-model
© www.soinside.com 2019 - 2024. All rights reserved.