开始微调时的损失高于迁移学习的损失

Question

由于我开始使用通过迁移学习学到的权重进行微调，我预计损失会相同或更少。然而，看起来它开始使用一组不同的起始权重进行微调。

开始迁移学习的代码：

base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
                                              include_top=False, 
                                              weights='imagenet')
base_model.trainable = False

model = tf.keras.Sequential([
  base_model,
  tf.keras.layers.GlobalAveragePooling2D(),
  tf.keras.layers.Dense(units=3, activation='sigmoid')
])

model.compile(optimizer='adam', 
              loss='binary_crossentropy', 
              metrics=['accuracy'])

epochs = 1000
callback = tf.keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True)
history = model.fit(train_generator,
                    steps_per_epoch=len(train_generator), 
                    epochs=epochs,
                    validation_data=val_generator,
                    validation_steps=len(val_generator),
                    callbacks=[callback],)

上一个纪元的输出：

Epoch 29/1000
232/232 [==============================] - 492s 2s/step - loss: 0.1298 - accuracy: 0.8940 - val_loss: 0.1220 - val_accuracy: 0.8937

开始微调的代码：

model.trainable = True

# Fine-tune from this layer onwards
fine_tune_at = -20

# Freeze all the layers before the `fine_tune_at` layer
for layer in model.layers[:fine_tune_at]:
  layer.trainable =  False

model.compile(optimizer=tf.keras.optimizers.Adam(1e-5),
              loss='binary_crossentropy',
              metrics=['accuracy'])

history_fine = model.fit(train_generator,
                         steps_per_epoch=len(train_generator), 
                         epochs=epochs,
                         validation_data=val_generator,
                         validation_steps=len(val_generator),
                         callbacks=[callback],)

但这就是我在前几个时期所看到的：

Epoch 1/1000
232/232 [==============================] - ETA: 0s - loss: 0.3459 - accuracy: 0.8409/usr/local/lib/python3.7/dist-packages/PIL/Image.py:960: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images
  "Palette images with Transparency expressed in bytes should be "
232/232 [==============================] - 509s 2s/step - loss: 0.3459 - accuracy: 0.8409 - val_loss: 0.7755 - val_accuracy: 0.7262
Epoch 2/1000
232/232 [==============================] - 502s 2s/step - loss: 0.1889 - accuracy: 0.9066 - val_loss: 0.5628 - val_accuracy: 0.8881

最终损失下降并超过了迁移学习损失：

Epoch 87/1000
232/232 [==============================] - 521s 2s/step - loss: 0.0232 - accuracy: 0.8312 - val_loss: 0.0481 - val_accuracy: 0.8563

为什么第一个微调时期的损失比迁移学习的最后一个损失更高？

Answer 1

根据 Tensorflow、Keras 的迁移学习和微调页面链接。 Batch Norm 层的参数应该保留。

重要的是，虽然基础模型变得可训练，但它仍然在推理模式下运行，因为我们在构建模型时调用它时传递了training=False。这意味着内部的批量归一化层不会更新其批量统计信息。如果他们这样做，就会对模型迄今为止学到的表示造成严重破坏。

下面是我所做的，解决了解冻层后损失突然增加的问题：

from tensorflow.keras import layers
from tensorflow.keras.applications import MobileNet

img_width, img_height, num_channel = 128, 128, 3
conv_base = MobileNet(
             include_top=False,
             input_shape=(img_width, img_height, num_channel),
             pooling="avg")
conv_base.trainable = False

check_layer = layers.BatchNormalization() # a dummy layer

for layer in conv_base.layers[-50:]: # unfreeze 50 layers from the top
        # check if the layer is of type BatchNorm
        if type(layer) != type(check_layer): 
            layer.trainable = True

print(conv_base.summary(show_trainable=True)) # checking the layers' trainability

开始微调时的损失高于迁移学习的损失

问题描述投票：0回答：1

1个回答

最新问题

开始微调时的损失高于迁移学习的损失

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1