Dropout(1.0) 和 stop_gradient 有什么区别?

问题描述 投票:0回答:1

考虑这两种架构:

prev_layer -> dropout 1.0 -> next_layer (output layer)
prev_layer -> stop_gradient -> next_layer (output layer)

当梯度从输出层流向输入时,两者必须产生相同的行为,其中

prev_layer
权重不会更新,那么有什么区别?

我已经用这段代码验证了:

input_layer = Input(shape=(1,))
prev_layer = Dense(32, activation='relu')(input_layer)
dropout_layer = Dropout(.99999)(prev_layer)  # This is effectively disabling the layer
output_layer = Dense(1, activation='linear')(dropout_layer)
model_dropout = Model(inputs=input_layer, outputs=output_layer)
model_dropout.compile(optimizer='adam', loss='mse')

input_layer = Input(shape=(1,))
prev_layer = Dense(32, activation='relu')(input_layer)
stop_gradient_layer = Lambda(lambda x: tf.stop_gradient(x))(prev_layer)
output_layer = Dense(1, activation='linear')(stop_gradient_layer)
model_stopgradient = Model(inputs=input_layer, outputs=output_layer)
model_stopgradient.compile(optimizer='adam', loss='mse')

训练他们:

before_train_dropout = model_dropout.layers[1].get_weights()
before_train_stopgradient = model_stopgradient.layers[1].get_weights()

X_dummy = np.random.rand(5, 1)
y_dummy = np.random.rand(5, 1)

model_dropout.fit(X_dummy, y_dummy, epochs=50, verbose=0)
model_stopgradient.fit(X_dummy, y_dummy, epochs=50, verbose=0)

after_train_dropout = model_dropout.layers[1].get_weights()
after_train_stopgradient = model_stopgradient.layers[1].get_weights()

# Is array equal
print('weight')
display(np.array_equal(np.array(before_train_dropout[0]), np.array(after_train_dropout[0])))
display(np.array_equal(np.array(before_train_stopgradient[0]), np.array(after_train_stopgradient[0])))
print('bias')
display(np.array_equal(np.array(before_train_dropout[1]), np.array(after_train_dropout[1])))
display(np.array_equal(np.array(before_train_stopgradient[1]), np.array(after_train_stopgradient[1])))

返回:

weight
True
True

bias
True
True

那么,我什么时候应该在

Dropout(1.0)
stop_gradient
之间使用?

python arrays numpy tensorflow keras
1个回答
0
投票

虽然它们在向后传球时做同样的事情,但在向前传球时它们的行为却截然不同。

stop_gradient 层将输入张量按原样传递到输出。

dropout层以一定的概率丢弃一些元素。如果rate=1,那么所有输出都为零。

© www.soinside.com 2019 - 2024. All rights reserved.