运行时错误:cuDNN 错误:CUDNN_STATUS_EXECUTION_FAILED

问题描述 投票:0回答:0

我正在尝试微调

tacotron2
模型,我有以下
cuda
pytorch
版本。

cudatoolkit               10.0.130                      0  
cudnn                     7.6.5                cuda10.0_0  
pytorch                   1.2.0           cuda100py37h938c94c_0  
torchvision               0.4.0           cuda100py37hecfc37a_0  

在使用

python train.py
训练 tacotron2 时,我遇到了以下错误。

Epoch: 0
Traceback (most recent call last):
  File "train.py", line 307, in <module>
    args.warm_start, args.n_gpus, args.rank, args.group_name, hparams)
  File "train.py", line 228, in train
    y_pred = model(x)
  File "/home/zakipoint/miniconda3/envs/tts_train/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/zakipoint/tts/tacotron2/model.py", line 505, in forward
    encoder_outputs = self.encoder(embedded_inputs, text_lengths)
  File "/home/zakipoint/miniconda3/envs/tts_train/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/zakipoint/tts/tacotron2/model.py", line 185, in forward
    outputs, _ = self.lstm(x)
  File "/home/zakipoint/miniconda3/envs/tts_train/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/zakipoint/miniconda3/envs/tts_train/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 562, in forward
    return self.forward_packed(input, hx)
  File "/home/zakipoint/miniconda3/envs/tts_train/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 554, in forward_packed
    output, hidden = self.forward_impl(input, hx, batch_sizes, max_batch_size, sorted_indices)
  File "/home/zakipoint/miniconda3/envs/tts_train/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 529, in forward_impl
    self.num_layers, self.dropout, self.training, self.bidirectional)
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

我已经通过将

batch_size
设置为 2、4、8、16 等进行了测试,但减少
batch_size
对我不起作用。我在
NVIDIA GeForce RTX 3090
上训练它。训练时 CPU 使用率 100%,但 GPU 使用率不到 1%。
tacotron2
GitHub 存储库上有一些类似的问题。但是没有解决方案对我修复它有帮助。我正在寻找解决此运行时问题的完美解决方案。谢谢!

python cuda text-to-speech torch
© www.soinside.com 2019 - 2024. All rights reserved.