为什么我的 PyTorch 调度程序似乎无法正常工作?

问题描述 投票:0回答:1

我正在尝试使用简单的 PyTorch Scheduler 训练 mobileNetV3Large。 这是负责训练的代码部分:

bench_val_loss = 1000
bench_acc = 0.0
epochs = 15
optimizer = optim.Adam(embeddingNet.parameters(), lr=1e-3) 
loss_optimizer = torch.optim.Adam(loss_fn.parameters(), lr=1e-3)

scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=3, threshold=0.02)

for epoch in range(1, epochs + 1):

    print(f'current lr: {scheduler.get_last_lr()}')
    loss=train(embeddingNet, loss_fn, device, train_dataloader, optimizer, loss_optimizer, epoch)
    val_loss, accuracy =test(train_dataset, val_dataset, embeddingNet, accuracy_calculator, loss_fn, epoch, val_dataloader)
    #val_loss = simpleTest(train_dataset, val_dataset, embeddingNet, accuracy_calculator, loss_fn, epoch, val_dataloader)

    
    torch.save(embeddingNet.state_dict(), 'my/path/mobileNetV3L_ArcFaceLAST.pth')

    if accuracy >= bench_acc:
      bench_val_loss = val_loss
      torch.save(embeddingNet.state_dict(), 'my/path/mobileNetV3L_ArcFaceBEST.pth')

    scheduler.step(accuracy)

    writer.add_scalars('Training vs. Validation Loss',
                       {'Training': loss, 'Validation': val_loss},
                       global_step=epoch+1)

在这里您可以找到前 7 个训练日志

Test set accuracy (Precision@1) = 0.17834772304046048
current lr: [0.001]
Epoch 3: Loss = 39.68284225463867
Epoch 3: valLoss = 39.9765007019043
100%|██████████| 962/962 [01:43<00:00,  9.28it/s]
100%|██████████| 370/370 [00:41<00:00,  8.92it/s]
Computing accuracy
Test set accuracy (Precision@1) = 0.31242593533096324
current lr: [0.001]
Epoch 4: Loss = 39.4412841796875
Epoch 4: valLoss = 39.67761562450512
100%|██████████| 962/962 [01:45<00:00,  9.11it/s]
100%|██████████| 370/370 [00:41<00:00,  8.86it/s]
Computing accuracy
Test set accuracy (Precision@1) = 0.3633824276282377
current lr: [0.001]
Epoch 5: Loss = 39.09823989868164
Epoch 5: valLoss = 39.54649614901156
100%|██████████| 962/962 [01:42<00:00,  9.37it/s]
100%|██████████| 370/370 [00:41<00:00,  8.87it/s]
Computing accuracy
Test set accuracy (Precision@1) = 0.44244117149145085
current lr: [0.001]
Epoch 6: Loss = 38.70449447631836
Epoch 6: valLoss = 39.1865906792718
100%|██████████| 962/962 [01:45<00:00,  9.15it/s]
100%|██████████| 370/370 [00:39<00:00,  9.25it/s]
Computing accuracy
Test set accuracy (Precision@1) = 0.5167597765363129
current lr: [0.0001]

我不明白为什么调度程序决定降低学习率,即使准确率增长得比阈值更快。

错误在哪里?

python machine-learning pytorch conv-neural-network scheduler
1个回答
0
投票

当您使用带有mode='min'的ReduceLROnPlateaue时,当监控数量不减少时,学习率将会降低。由于您想要提高监控准确性,因此应该使用 mode='max'。

最新问题
© www.soinside.com 2019 - 2024. All rights reserved.