我收到如下CUDNN_STATUS_INTERNAL_ERROR错误。
python train_v2.py
Traceback (most recent call last):
File "train_v2.py", line 113, in <module>
main()
File "train_v2.py", line 74, in main
model.cuda()
File "/home/ahkim/Desktop/squad_vteam/src/model.py", line 234, in cuda
self.network.cuda()
File "/home/ahkim/anaconda3/envs/san/lib/python3.6/site-packages/torch/nn/modules/module.py", line 249, in cuda
return self._apply(lambda t: t.cuda(device))
File "/home/ahkim/anaconda3/envs/san/lib/python3.6/site-packages/torch/nn/modules/module.py", line 176, in _apply
module._apply(fn)
File "/home/ahkim/anaconda3/envs/san/lib/python3.6/site-packages/torch/nn/modules/module.py", line 176, in _apply
module._apply(fn)
File "/home/ahkim/anaconda3/envs/san/lib/python3.6/site-packages/torch/nn/modules/module.py", line 176, in _apply
module._apply(fn)
File "/home/ahkim/anaconda3/envs/san/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 112, in _apply
self.flatten_parameters()
File "/home/ahkim/anaconda3/envs/san/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 105, in flatten_parameters
self.batch_first, bool(self.bidirectional))
RuntimeError: CUDNN_STATUS_INTERNAL_ERROR
我应该尝试解决这个问题?我尝试删除.nv但没有成功。
nvidia-smi
Wed Aug 8 10:56:29 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.67 Driver Version: 390.67 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX TIT... Off | 00000000:04:00.0 Off | N/A |
| 22% 21C P8 15W / 250W | 125MiB / 12212MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX TIT... Off | 00000000:05:00.0 Off | N/A |
| 22% 24C P8 14W / 250W | 11MiB / 12212MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX TIT... Off | 00000000:08:00.0 Off | N/A |
| 22% 23C P8 14W / 250W | 11MiB / 12212MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX TIT... Off | 00000000:09:00.0 Off | N/A |
| 22% 23C P8 15W / 250W | 11MiB / 12212MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 GeForce GTX TIT... Off | 00000000:85:00.0 Off | N/A |
| 22% 24C P8 14W / 250W | 11MiB / 12212MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 GeForce GTX TIT... Off | 00000000:86:00.0 Off | N/A |
| 22% 23C P8 15W / 250W | 11MiB / 12212MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 GeForce GTX TIT... Off | 00000000:89:00.0 Off | N/A |
| 22% 21C P8 15W / 250W | 11MiB / 12212MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 GeForce GTX TIT... Off | 00000000:8A:00.0 Off | N/A |
| 22% 23C P8 15W / 250W | 11MiB / 12212MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1603 C /usr/bin/python 114MiB |
+-----------------------------------------------------------------------------+
使用Nvidia Driver Version: 396.26
(cuda V9.1.85.torch.backends.cudnn.version():7102),相同的代码运行没有错误。我使用Driver Version: 390.67
收到错误(cuda V9.1.85.torch.backends.cudnn.version():7102)
通过以下步骤解决。
export LD_LIBRARY_PATH= "/usr/local/cuda-9.1/lib64"
转到pytorch网站,选择满足你的cuda版本https://pytorch.org/的版本
cu100 = cuda 10.0
pip3 uninstall torch
pip3 install https://download.pytorch.org/whl/cu100/torch-1.0.1.post2-cp36-cp36m-linux_x86_64.whl