我正在尝试使用简单的 PyTorch Scheduler 训练 mobileNetV3Large。 这是负责训练的代码部分:
bench_val_loss = 1000
bench_acc = 0.0
epochs = 15
optimizer = optim.Adam(embeddingNet.parameters(), lr=1e-3)
loss_optimizer = torch.optim.Adam(loss_fn.parameters(), lr=1e-3)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=3, threshold=0.02)
for epoch in range(1, epochs + 1):
print(f'current lr: {scheduler.get_last_lr()}')
loss=train(embeddingNet, loss_fn, device, train_dataloader, optimizer, loss_optimizer, epoch)
val_loss, accuracy =test(train_dataset, val_dataset, embeddingNet, accuracy_calculator, loss_fn, epoch, val_dataloader)
#val_loss = simpleTest(train_dataset, val_dataset, embeddingNet, accuracy_calculator, loss_fn, epoch, val_dataloader)
torch.save(embeddingNet.state_dict(), 'my/path/mobileNetV3L_ArcFaceLAST.pth')
if accuracy >= bench_acc:
bench_val_loss = val_loss
torch.save(embeddingNet.state_dict(), 'my/path/mobileNetV3L_ArcFaceBEST.pth')
scheduler.step(accuracy)
writer.add_scalars('Training vs. Validation Loss',
{'Training': loss, 'Validation': val_loss},
global_step=epoch+1)
在这里您可以找到前 7 个训练日志
Test set accuracy (Precision@1) = 0.17834772304046048
current lr: [0.001]
Epoch 3: Loss = 39.68284225463867
Epoch 3: valLoss = 39.9765007019043
100%|██████████| 962/962 [01:43<00:00, 9.28it/s]
100%|██████████| 370/370 [00:41<00:00, 8.92it/s]
Computing accuracy
Test set accuracy (Precision@1) = 0.31242593533096324
current lr: [0.001]
Epoch 4: Loss = 39.4412841796875
Epoch 4: valLoss = 39.67761562450512
100%|██████████| 962/962 [01:45<00:00, 9.11it/s]
100%|██████████| 370/370 [00:41<00:00, 8.86it/s]
Computing accuracy
Test set accuracy (Precision@1) = 0.3633824276282377
current lr: [0.001]
Epoch 5: Loss = 39.09823989868164
Epoch 5: valLoss = 39.54649614901156
100%|██████████| 962/962 [01:42<00:00, 9.37it/s]
100%|██████████| 370/370 [00:41<00:00, 8.87it/s]
Computing accuracy
Test set accuracy (Precision@1) = 0.44244117149145085
current lr: [0.001]
Epoch 6: Loss = 38.70449447631836
Epoch 6: valLoss = 39.1865906792718
100%|██████████| 962/962 [01:45<00:00, 9.15it/s]
100%|██████████| 370/370 [00:39<00:00, 9.25it/s]
Computing accuracy
Test set accuracy (Precision@1) = 0.5167597765363129
current lr: [0.0001]
我不明白为什么调度程序决定降低学习率,即使准确率增长得比阈值更快。
错误在哪里?
当您使用带有mode='min'的ReduceLROnPlateaue时,当监控数量不减少时,学习率将会降低。由于您想要提高监控准确性,因此应该使用 mode='max'。