我在使用 PyTorch Lightning 和预训练的 BERT 模型训练评论分类模型时遇到问题。
我在训练过程中遇到以下错误:
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
为了提供一些上下文,我已经使用函数enable_gradients(model)为模型的所有参数启用了梯度。但是,错误仍然存在。
我使用的模型基于 aubmindlab/bert-base-arabertv02-twitter 预训练模型,我注意到 BERT 模型的某些权重在加载时未正确初始化。我已确保使用最新版本的 PyTorch、Transformers 和 PyTorch Lightning。
在训练我的特定模型之前,我尝试在下游任务上预训练 BERT 模型,但错误仍未解决。
如何解决这个问题?
这是我的代码:
from pytorch_lightning import Trainer
def enable_gradients(model):
for param in model.parameters():
param.requires_grad = True
# datamodule
ucc_data_module = UCC_Data_Module(train_path, val_path, test_path, attributes=attributes, batch_size=config['batch_size'])
ucc_data_module.setup()
# model
model = UCC_Comment_Classifier()
enable_gradients(model)
# trainer and fit
# Instantiation of the Lightning Trainer
trainer = Trainer(max_epochs=config['n_epochs'], accelerator='gpu', num_sanity_val_steps=1)
try:
trainer.fit(model, ucc_data_module)
torch.save(model.state_dict(), PATH)
except RuntimeError as e:
print(e)
这是错误:
ProcessRaisedException:
-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/multiprocessing.py", line
147, in _wrapping_function
results = function(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 568, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 973, in _run
results = self._run_stage()
File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1016, in _run_stage
self.fit_loop.run()
File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 201, in run
self.advance()
File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 354, in advance
self.epoch_loop.run(self._data_fetcher)
File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 133, in run
self.advance(data_fetcher)
File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 218, in
advance
batch_output = self.automatic_optimization.run(trainer.optimizers[0], kwargs)
File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 185, in
run
self._optimizer_step(kwargs.get("batch_idx", 0), closure)
File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 260, in
_optimizer_step
call._call_lightning_module_hook(
File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 144, in
_call_lightning_module_hook
output = fn(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/core/module.py", line 1256, in optimizer_step
optimizer.step(closure=optimizer_closure)
File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/core/optimizer.py", line 155, in step
step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/strategies/ddp.py", line 256, in optimizer_step
optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 225, in
optimizer_step
return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 114,
in optimizer_step
return optimizer.step(closure=closure, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 69, in wrapper
return wrapped(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/optim/optimizer.py", line 280, in wrapper
out = func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/optimization.py", line 439, in step
loss = closure()
File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 101,
in _wrap_closure
closure_result = closure()
File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 140, in
__call__
self._result = self.closure(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 135, in
closure
self._backward_fn(step_output.closure_loss)
File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 232, in
backward_fn
call._call_strategy_hook(self.trainer, "backward", loss, optimizer)
File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 291, in
_call_strategy_hook
output = fn(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 200, in backward
self.precision_plugin.backward(closure_loss, self.lightning_module, optimizer, *args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 67,
in backward
model.backward(tensor, *args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/core/module.py", line 1046, in backward
loss.backward(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/opt/conda/lib/python3.10/site-packages/torch/autograd/__init__.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
解决方法是在
torch.set_grad_enabled(True)
的开头添加 training_step
,或使用 torch 中的 AdamW 优化器。