我正在开发一个用于异常检测的 LSTM 自动编码器模型。我的 keras 模型设置如下:
from keras.models import Sequential
from keras import Model, layers
from keras.layers import Layer, Conv1D, Input, Masking, Dense, RNN, LSTM, Dropout, RepeatVector, TimeDistributed, Masking, Reshape
def create_RNN_with_attention():
x=Input(shape=(X_train_dt.shape[1], X_train_dt.shape[2]))
RNN_layer_1 = LSTM(units=64, return_sequences=False)(x)
attention_layer = attention()(RNN_layer_1)
dropout_layer_1 = Dropout(rate=0.2)(attention_layer)
repeat_vector_layer = RepeatVector(n=X_train_dt.shape[1])(dropout_layer_1)
RNN_layer_2 = LSTM(units=64, return_sequences=True)(repeat_vector_layer)
dropout_layer_1 = Dropout(rate=0.2)(RNN_layer_2)
output = TimeDistributed(Dense(X_train_dt.shape[2], trainable=True))(dropout_layer_1)
model=Model(x,output)
model.compile(loss='mae', optimizer='adam')
return model
注意我添加的注意力层,
attention_layer
。在添加此之前,模型编译完美,但是在添加此attention_layer之后 - 模型抛出以下错误:ValueError: Input 0 is incompatible with layer repeat_vector_40: expected ndim=2, found ndim=1
我的注意力层设置如下:
import keras.backend as K
class attention(Layer):
def __init__(self,**kwargs):
super(attention,self).__init__(**kwargs)
def build(self,input_shape):
self.W=self.add_weight(name='attention_weight', shape=(input_shape[-1],1),
initializer='random_normal', trainable=True)
self.b=self.add_weight(name='attention_bias', shape=(input_shape[1],1),
initializer='zeros', trainable=True)
super(attention, self).build(input_shape)
def call(self,x):
# Alignment scores. Pass them through tanh function
e = K.tanh(K.dot(x,self.W)+self.b)
# Remove dimension of size 1
e = K.squeeze(e, axis=-1)
# Compute the weights
alpha = K.softmax(e)
# Reshape to tensorFlow format
alpha = K.expand_dims(alpha, axis=-1)
# Compute the context vector
context = x * alpha
context = K.sum(context, axis=1)
return context
注意力掩模的想法是让模型像火车一样关注更突出的特征。
为什么我会收到上述错误以及如何解决此问题?
我认为问题出在这一行:
RNN_layer_1 = LSTM(units=64, return_sequences=False)(x)
该层输出形状为
(batch_size, 64)
的张量。所以这意味着你输出一个向量,然后在 w.r.t 上运行注意力机制。批量维度而不是顺序维度。这也意味着您的输出具有压缩的批量尺寸,这对于任何 keras
层来说都是不可接受的。这就是为什么 Repeat
层会产生错误,因为它期望向量的形状至少为 (batch_dimension, dim)
。
如果你想在序列上运行注意力机制,那么你应该将上面提到的行切换为:
RNN_layer_1 = LSTM(units=64, return_sequences=True)(x)
在注意力模型中,通常不使用“RepeatVector”层。该层有助于重复输入向量与输出时间一样多的次数。但是当使用注意力机制时,不需要重复输出向量,因为重要性适用于所有时间。
更具体地说,在您的模型中,
LSTM'' layer is first taken with
attention''层中的RNN_layer_1''. Then, by applying the attention mechanism (through
return_sequences=True''和RepeatVector'' to repeat vectors), the importances are determined for each time. Finally, with
TimeDistributed Dense''的输出,每次都会计算输出。
因此,这里不需要
RepeatVector
层,应该将其删除。