tacotron2
模型,我有以下cuda
和pytorch
版本。
cudatoolkit 10.0.130 0
cudnn 7.6.5 cuda10.0_0
pytorch 1.2.0 cuda100py37h938c94c_0
torchvision 0.4.0 cuda100py37hecfc37a_0
在使用
python train.py
训练 tacotron2 时,我遇到了以下错误。
Epoch: 0
Traceback (most recent call last):
File "train.py", line 307, in <module>
args.warm_start, args.n_gpus, args.rank, args.group_name, hparams)
File "train.py", line 228, in train
y_pred = model(x)
File "/home/zakipoint/miniconda3/envs/tts_train/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/home/zakipoint/tts/tacotron2/model.py", line 505, in forward
encoder_outputs = self.encoder(embedded_inputs, text_lengths)
File "/home/zakipoint/miniconda3/envs/tts_train/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/home/zakipoint/tts/tacotron2/model.py", line 185, in forward
outputs, _ = self.lstm(x)
File "/home/zakipoint/miniconda3/envs/tts_train/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/home/zakipoint/miniconda3/envs/tts_train/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 562, in forward
return self.forward_packed(input, hx)
File "/home/zakipoint/miniconda3/envs/tts_train/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 554, in forward_packed
output, hidden = self.forward_impl(input, hx, batch_sizes, max_batch_size, sorted_indices)
File "/home/zakipoint/miniconda3/envs/tts_train/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 529, in forward_impl
self.num_layers, self.dropout, self.training, self.bidirectional)
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
我已经通过将
batch_size
设置为 2、4、8、16 等进行了测试,但减少 batch_size
对我不起作用。我在NVIDIA GeForce RTX 3090
上训练它。训练时 CPU 使用率 100%,但 GPU 使用率不到 1%。 tacotron2
GitHub 存储库上有一些类似的问题。但是没有解决方案对我修复它有帮助。我正在寻找解决此运行时问题的完美解决方案。谢谢!