由于我开始使用通过迁移学习学到的权重进行微调,我预计损失会相同或更少。然而,看起来它开始使用一组不同的起始权重进行微调。
开始迁移学习的代码:
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
include_top=False,
weights='imagenet')
base_model.trainable = False
model = tf.keras.Sequential([
base_model,
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(units=3, activation='sigmoid')
])
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
epochs = 1000
callback = tf.keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True)
history = model.fit(train_generator,
steps_per_epoch=len(train_generator),
epochs=epochs,
validation_data=val_generator,
validation_steps=len(val_generator),
callbacks=[callback],)
上一个纪元的输出:
Epoch 29/1000
232/232 [==============================] - 492s 2s/step - loss: 0.1298 - accuracy: 0.8940 - val_loss: 0.1220 - val_accuracy: 0.8937
开始微调的代码:
model.trainable = True
# Fine-tune from this layer onwards
fine_tune_at = -20
# Freeze all the layers before the `fine_tune_at` layer
for layer in model.layers[:fine_tune_at]:
layer.trainable = False
model.compile(optimizer=tf.keras.optimizers.Adam(1e-5),
loss='binary_crossentropy',
metrics=['accuracy'])
history_fine = model.fit(train_generator,
steps_per_epoch=len(train_generator),
epochs=epochs,
validation_data=val_generator,
validation_steps=len(val_generator),
callbacks=[callback],)
但这就是我在前几个时期所看到的:
Epoch 1/1000
232/232 [==============================] - ETA: 0s - loss: 0.3459 - accuracy: 0.8409/usr/local/lib/python3.7/dist-packages/PIL/Image.py:960: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images
"Palette images with Transparency expressed in bytes should be "
232/232 [==============================] - 509s 2s/step - loss: 0.3459 - accuracy: 0.8409 - val_loss: 0.7755 - val_accuracy: 0.7262
Epoch 2/1000
232/232 [==============================] - 502s 2s/step - loss: 0.1889 - accuracy: 0.9066 - val_loss: 0.5628 - val_accuracy: 0.8881
最终损失下降并超过了迁移学习损失:
Epoch 87/1000
232/232 [==============================] - 521s 2s/step - loss: 0.0232 - accuracy: 0.8312 - val_loss: 0.0481 - val_accuracy: 0.8563
为什么第一个微调时期的损失比迁移学习的最后一个损失更高?
根据 Tensorflow、Keras 的迁移学习和微调页面链接。 Batch Norm 层的参数应该保留。
重要的是,虽然基础模型变得可训练,但它仍然在推理模式下运行,因为我们在构建模型时调用它时传递了training=False。这意味着内部的批量归一化层不会更新其批量统计信息。如果他们这样做,就会对模型迄今为止学到的表示造成严重破坏。
下面是我所做的,解决了解冻层后损失突然增加的问题:
from tensorflow.keras import layers
from tensorflow.keras.applications import MobileNet
img_width, img_height, num_channel = 128, 128, 3
conv_base = MobileNet(
include_top=False,
input_shape=(img_width, img_height, num_channel),
pooling="avg")
conv_base.trainable = False
check_layer = layers.BatchNormalization() # a dummy layer
for layer in conv_base.layers[-50:]: # unfreeze 50 layers from the top
# check if the layer is of type BatchNorm
if type(layer) != type(check_layer):
layer.trainable = True
print(conv_base.summary(show_trainable=True)) # checking the layers' trainability