在 PyTorch 中训练基于 Transformer 的 CV 模型时的高方差训练损失和恒定验证损失

Question

我正在尝试微调基于 Transformer 的计算机视觉模型，特别是

microsoft/swinv2-large-patch4-window12to16-192to256-22kto1k-ft

模型，可通过 HuggingFace 获得。

我的训练过程有点复杂。它涉及：

在合成图像数据集和 ImageNet 子集上进行训练
使用低秩适应来减少可训练参数的数量（paper，HuggingFace 文档）

我正在训练的模型有一个用于图像分类的主输出头，以及 8 个用于预测合成数据集上注释的属性的补充输出头，其中包括对象的姿势和照明方向等特征。

训练过程同时使用真实图像和合成图像，并且在每个批次中，模型参数首先在一批合成图像上进行训练。这是使用一个优化器来完成的，该优化器训练模型的所有可训练参数。接下来，模型在一批 ImageNet 图像上进行训练，并使用“不同”优化器，该优化器仅影响主分类头的参数。（其他参数在 ImageNet 中没有标注。）这种方法是作为 replay 的方法实现的，以避免 灾难性遗忘。不幸的是，我遇到了一个无法解决的问题。查看我记录到 Tensorboard 的训练和验证准确性和损失，我发现每个任务的训练数据的准确性随着时间的推移

下降

，而损失增加。与此同时，真实任务和合成任务的验证准确性和损失保持完全恒定。

此图中显示了超过 14 个训练周期的一些样本图

所有任务都是离散化的，所有任务使用的损失函数都是分类交叉熵。对于多任务损失，使用所有类别的加权和将它们组合起来。对于仅 ImageNet 损失，使用原始 CCEL。

下面提供了代码的简化版本（带有描述省略步骤的注释）：

# Custom data sets are defined and initialised # Custom model is defined and initialised # Custom loss functions are defined and initialised # Base model is loaded from HuggingFace and passed into the custom model # LoRA is applied to the model config = LoraConfig( r=20, lora_alpha=20, target_modules=["query", "value"], lora_dropout=0.2, bias="none", modules_to_save=[ "dense_layer", "additional_dense_layer", "classifier", # Names of the other output layers ], ) model = get_peft_model(model, config) # Model is distributed over 3 GPUs using DistributedDataParallel # The two optimisers are created # One optimiser for updating all parameters on synthetic images all_parameters = model.parameters() optimizer_all = torch.optim.Adam(all_parameters, lr=0.01) # And one for when only updating the classification head on ImageNet images lora_parameters = ( param for name, param in model.module.named_parameters() if ("lora_A" in name or "lora_B" in name) ) imagenet_parameters = chain( lora_parameters, model.module.classifier.parameters(), model.module.dense_layer.parameters(), model.module.additional_dense_layer.parameters(), ) optimizer_imagenet = torch.optim.Adam(imagenet_parameters, lr=0.01) # Initialize the StepLR schedulers scheduler_all = StepLR(optimizer_all, step_size=3, gamma=0.5) scheduler_imagenet = StepLR(optimizer_imagenet, step_size=3, gamma=0.5) # Training loop (many logging steps omitted) for epoch in range(start_epoch, args.stop_epoch): # Set the epoch for DistributedSampler synth_train_dl.sampler.set_epoch(epoch) imagenet_train_dl.sampler.set_epoch(epoch) for i, (synth_batch, imagenet_batch) in enumerate( zip(synth_train_dl, imagenet_train_dl) ): ################### # Synthetic stage # Move the batch tensors to the same device as the model synth_batch = {k: v.to(rank) for k, v in synth_batch.items()} # Zero the gradients optimizer_all.zero_grad() # Perform the forward pass synth_outputs = model(synth_batch["image"]) # Compute the loss synth_loss = multi_loss_fn(synth_outputs, synth_batch) # Perform the backward pass synth_loss.backward() # Update the weights optimizer_all.step() ################### # ImageNet stage # Move the batch tensors to the same device as the model imagenet_batch = {k: v.to(rank) for k, v in imagenet_batch.items()} # Zero the gradients optimizer_imagenet.zero_grad() # Perform the forward pass imagenet_outputs = model(imagenet_batch["image"]) # Compute the loss imagenet_loss = single_loss_fn(imagenet_outputs, imagenet_batch) # Perform the backward pass imagenet_loss.backward() # Update the weights optimizer_imagenet.step() # Step the schedulers scheduler_all.step() scheduler_imagenet.step() # After each epoch, evaluate the model on the validation set model.eval() val_task_losses = {task: 0 for task in multi_loss_fn.weights.keys()} val_task_accuracies = {task: 0 for task in multi_loss_fn.weights.keys()} with torch.no_grad(): val_loss = 0 imagenet_loss = 0 for i, (synth_batch, imagenet_batch) in enumerate( zip(synth_val_dl, imagenet_val_dl) ): ################### # Synthetic validation # Move the batch tensors to the same device as the model synth_batch = {k: v.to(rank) for k, v in synth_batch.items()} synth_outputs = model(synth_batch["image"]) loss = multi_loss_fn(synth_outputs, synth_batch) val_loss += loss.item() # Log loss and accuracy for each task on the validation set for task in multi_loss_fn.weights.keys(): task_loss = multi_loss_fn.loss_fn( synth_outputs[task], synth_batch[task] ) task_acc = compute_accuracy(synth_outputs[task], synth_batch[task]) val_task_losses[task] += task_loss.item() val_task_accuracies[task] += task_acc ################### # ImageNet validation # Move the batch tensors to the same device as the model imagenet_batch = {k: v.to(rank) for k, v in imagenet_batch.items()} imagenet_outputs = model(imagenet_batch["image"]) loss = single_loss_fn(imagenet_outputs, imagenet_batch) imagenet_loss += loss.item() for task in multi_loss_fn.weights.keys(): avg_task_loss = val_task_losses[task] / len(synth_val_dl) avg_task_acc = val_task_accuracies[task] / len(synth_val_dl) writer.add_scalar(f"Loss/val/{task}", avg_task_loss, epoch) writer.add_scalar(f"Accuracy/val/{task}", avg_task_acc, epoch) # Write the overall validation loss to TensorBoard val_loss /= len(synth_val_dl) writer.add_scalar("Synthetic Loss/val", val_loss, epoch) imagenet_loss /= len(imagenet_val_dl) writer.add_scalar("ImageNet Loss/val", imagenet_loss, epoch) # Print the validation loss print(f"Epoch {epoch+1}/{args.stop_epoch}, Validation Loss: {val_loss}") # Save the model parameters save_checkpoint = { "epoch": epoch + 1, "model_state_dict": model.state_dict(), "all_optimizer_state_dict": optimizer_all.state_dict(), "single_optimizer_state_dict": optimizer_imagenet.state_dict(), } torch.save(save_checkpoint, f"checkpoints/checkpoint_epoch_{epoch+1}.pth") model.train()

不幸的是，部署和训练这个模型的过程非常缓慢，我几乎没有余力进行反复试验。我非常感谢任何人指出我的代码中会导致此问题的原因，并且很乐意提供更多信息。

最初我预计我的学习率太低，这导致参数没有充分改变，但是我现在从 0.01 的学习率开始，我认为这个值相当大，并且仍然看到相同的情况问题。

各个时期之间的验证损失“完全相同”这一事实让我相信该模型“根本没有”进行训练。考虑到结果，这似乎是合理的，因为数据集在训练期间被打乱，这可能导致训练损失的高方差，但是我不明白为什么我的模型会出现这种情况。我也确信传递给优化器的参数集非空。

由于我的经验不足，我担心这可能是一个简单的疏忽，再次非常感谢任何帮助。你能找出原因吗，因为我面临着类似的问题

在 PyTorch 中训练基于 Transformer 的 CV 模型时的高方差训练损失和恒定验证损失

问题描述投票：0回答：0

最新问题

在 PyTorch 中训练基于 Transformer 的 CV 模型时的高方差训练损失和恒定验证损失

问题描述 投票：0回答：0

最新问题

问题描述投票：0回答：0