运行时错误:“张量的元素 0 不需要 grad 并且没有 grad_fn”

问题描述 投票:0回答:1

我在使用 PyTorch Lightning 和预训练的 BERT 模型训练评论分类模型时遇到问题。

我在训练过程中遇到以下错误:

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

为了提供一些上下文,我已经使用函数enable_gradients(model)为模型的所有参数启用了梯度。但是,错误仍然存在。

我使用的模型基于 aubmindlab/bert-base-arabertv02-twitter 预训练模型,我注意到 BERT 模型的某些权重在加载时未正确初始化。我已确保使用最新版本的 PyTorch、Transformers 和 PyTorch Lightning。

在训练我的特定模型之前,我尝试在下游任务上预训练 BERT 模型,但错误仍未解决。

如何解决这个问题?

这是我的代码:

from pytorch_lightning import Trainer

def enable_gradients(model):
    for param in model.parameters():
        param.requires_grad = True

# datamodule
ucc_data_module = UCC_Data_Module(train_path, val_path, test_path, attributes=attributes, batch_size=config['batch_size'])
ucc_data_module.setup()

# model
model = UCC_Comment_Classifier()

enable_gradients(model)

# trainer and fit
# Instantiation of the Lightning Trainer
trainer = Trainer(max_epochs=config['n_epochs'], accelerator='gpu', num_sanity_val_steps=1)

try:
    trainer.fit(model, ucc_data_module)
    torch.save(model.state_dict(), PATH)
except RuntimeError as e:
    print(e)

这是错误:

ProcessRaisedException: 

-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/multiprocessing.py", line 
147, in _wrapping_function
    results = function(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 568, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 973, in _run
    results = self._run_stage()
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1016, in _run_stage
    self.fit_loop.run()
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 201, in run
    self.advance()
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 354, in advance
    self.epoch_loop.run(self._data_fetcher)
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 133, in run
    self.advance(data_fetcher)
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 218, in 
advance
    batch_output = self.automatic_optimization.run(trainer.optimizers[0], kwargs)
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 185, in 
run
    self._optimizer_step(kwargs.get("batch_idx", 0), closure)
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 260, in 
_optimizer_step
    call._call_lightning_module_hook(
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 144, in 
_call_lightning_module_hook
    output = fn(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/core/module.py", line 1256, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/core/optimizer.py", line 155, in step
    step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/strategies/ddp.py", line 256, in optimizer_step
    optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 225, in 
optimizer_step
    return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 114,
in optimizer_step
    return optimizer.step(closure=closure, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 69, in wrapper
    return wrapped(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/optim/optimizer.py", line 280, in wrapper
    out = func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/optimization.py", line 439, in step
    loss = closure()
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 101,
in _wrap_closure
    closure_result = closure()
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 140, in 
__call__
    self._result = self.closure(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 135, in 
closure
    self._backward_fn(step_output.closure_loss)
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 232, in 
backward_fn
    call._call_strategy_hook(self.trainer, "backward", loss, optimizer)
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 291, in 
_call_strategy_hook
    output = fn(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 200, in backward
    self.precision_plugin.backward(closure_loss, self.lightning_module, optimizer, *args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 67, 
in backward
    model.backward(tensor, *args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/core/module.py", line 1046, in backward
    loss.backward(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward
    torch.autograd.backward(
  File "/opt/conda/lib/python3.10/site-packages/torch/autograd/__init__.py", line 200, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
machine-learning nlp model training-data kaggle
1个回答
-1
投票

解决方法是在

torch.set_grad_enabled(True)
的开头添加
training_step
,或使用 torch 中的 AdamW 优化器。

© www.soinside.com 2019 - 2024. All rights reserved.