Tensorflow-gpu v1.13.1,CUDA:10.0,CuDNN:7.5.1,图形卡:RTX 2080,Ubuntu:18.04
我目前正在尝试使用CuDNNLSTM在tf中训练LSTM模型,但是每当我运行代码时,都会出现以下错误
2019-04-28 23:43:48.936154: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-04-28 23:43:48.936212: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cudnn_rnn_ops.cc:1217 : Unknown: Fail to find the dnn implementation.
Traceback (most recent call last):
File "/home/nicholas/PycharmProjects/deepLearninginKeras/crypto_currency_predict/crypto.py", line 139, in <module>
callbacks=[tensorboard, checkpoint])
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 880, in fit
validation_steps=validation_steps)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 329, in model_iteration
batch_outs = f(ins_batch)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3076, in __call__
run_metadata=self.run_metadata)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1439, in __call__
run_metadata_ptr)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.UnknownError: Fail to find the dnn implementation.
[[{{node cu_dnnlstm/CudnnRNN}}]]
[[{{node ConstantFoldingCtrl/loss/dense_1_loss/broadcast_weights/assert_broadcastable/AssertGuard/Switch_0}}]]
我不确定是什么引起了这个问题,我觉得可能是部分原因,我安装/使用的CUDA版本与我的显卡不同。在终端中使用命令“ nvidia-smi”时,我得到以下信息:
NVIDIA-SMI 418.56驱动程序版本:418.56 CUDA版本:10.1
在页面底部的〜/ .bashrc中,我具有以下路径:
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/$/cuda/extras/CUPTI/lib64
export CUDA_HOME=/usr/local/cuda
任何见识将不胜感激。这是我的模型中的示例图层:
model.add(tf.keras.layers.CuDNNLSTM(128, input_shape=train_x.shape[1:], return_sequences=True))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.BatchNormalization())
例如,最好去ubuntu 16还是不能解决问题。对于RTX 20xx,这似乎是一个非常普遍的问题。
对我来说,在安装前添加以下配置可以解决问题:
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
tf.Session(config=config)