我正在复制 ML 项目研究论文的结果。该论文是关于使用 CNN 进行手掌静脉识别的。它在不同的手掌静脉数据集上训练 3 个 CNN,其中之一是 FYODB 数据集。
我使用 Keras 从头开始训练我的模型,尽管 AlexNet 表现良好,测试准确率超过 95%,但由于某种原因,VGG16 和 VGG19 都无法在训练期间进行任何学习。他们在每个时期的准确率连 0.1 都达不到。
我将分享我用来构建和训练模型的代码 - 请注意,我正在复制的论文有意减少了每个 CONV2D 层的滤波器数量,以减少训练时间(尽管我也尝试过使用原始架构) ,相同的结果)。
一些关键常数:
这是我用于构建和训练的代码。我也尝试过使用预训练的 VGG16 权重进行迁移学习,实际上效果很好,测试准确率超过 95%。
data_augmentation = keras.Sequential(
[
layers.RandomFlip("horizontal"),
layers.RandomRotation(0.1),
layers.RandomZoom(0.1),
layers.RandomContrast(0.1),
layers.RandomTranslation(0.1, 0.1),
layers.RandomHeight(0.1),
layers.RandomWidth(0.1),
]
)
def make_vgg16_model(input_shape, num_classes):
inputs = keras.Input(shape=input_shape)
# Block 1
x = data_augmentation(inputs)
x = layers.Rescaling(1.0 / 255)(inputs)
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(inputs)
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2, 2), strides=(2, 2))(x)
# Block 2
x = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2, 2), strides=(2, 2))(x)
# Block 3
x = layers.Conv2D(96, (3, 3), activation='relu', padding='same')(x)
x = layers.Conv2D(96, (3, 3), activation='relu', padding='same')(x)
x = layers.Conv2D(96, (3, 3), activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2, 2), strides=(2, 2))(x)
# Block 4
x = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2, 2), strides=(2, 2))(x)
# Block 5
x = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2, 2), strides=(2, 2))(x)
# Flatten and Fully Connected Layers
x = layers.Flatten()(x)
x = layers.Dense(4096, activation='relu')(x)
x = layers.Dropout(0.5)(x)
x = layers.Dense(4096, activation='relu')(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(num_classes, activation='softmax')(x)
return keras.Model(inputs, outputs)
from tqdm import tqdm
num_epochs = 30
models = {
"AlexNet": make_alexnet_model(input_shape=image_size, num_classes=num_classes),
"VGG16": make_vgg16_model(input_shape=image_size, num_classes=num_classes),
"VGG19": make_vgg19_model(input_shape=image_size, num_classes=num_classes),
}
model_histories = {}
for name, model in models.items():
print(f'\x1b[34mTraining {name} Model...\x1b[0m')
model.compile(
optimizer=keras.optimizers.Adam(1e-3),
loss="sparse_categorical_crossentropy",
metrics=["accuracy"],
)
start = time.time()
# Wrap model.fit with tqdm for a progress bar
progress_bar = tqdm(total=num_epochs, position=0, leave=True)
history = model.fit(
train_dataset,
epochs=num_epochs,
validation_data=val_dataset,
verbose=1,
callbacks=[
tf.keras.callbacks.LambdaCallback(on_epoch_end=lambda epoch, logs: progress_bar.update(1)),
]
)
progress_bar.close()
model_histories[name] = history
end = time.time()
print(f'Finished training {name} in {end-start:.2f}s\n')
输出样本:
Epoch 14/30
128/128 [==============================] - ETA: 0s - loss: 5.0713 - accuracy: 0.0054
47%|████▋ | 14/30 [05:35<06:16, 23.50s/it]
128/128 [==============================] - 23s 182ms/step - loss: 5.0713 - accuracy: 0.0054 - val_loss: 5.1037 - val_accuracy: 0.0023
Epoch 15/30
128/128 [==============================] - ETA: 0s - loss: 5.0709 - accuracy: 0.0081
50%|█████ | 15/30 [05:58<05:53, 23.55s/it]
你的代码中唯一让我震惊的是这一点:
# Block 1
x = data_augmentation(inputs) # This is not being used
x = layers.Rescaling(1.0 / 255)(inputs) # This is not being used
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(inputs)
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2, 2), strides=(2, 2))(x)
看起来它通过重新分配
x
来丢弃前两层,而不在后续层中使用它?如果这是正确的,那么缺乏数据增强可以解释为什么训练这个网络很困难。