CuDNN 使用错误的 CUDA 版本

Question

我正在 Ubuntu 22.04.3 上的新配置文件上重新进行 CUDA/CuDNN 设置，因为 Keras 引发了一个我无法修复的 libcublasLT 错误。为了项目兼容性，我需要将 TF 12.12 与 CUDA 11.8 和 CuDNN 8.6 一起使用。我正在一台已经安装了多个 CUDA 版本的机器上工作。

$ cd /usr/local/

表演

cuda  cuda-11.8  cuda-12  cuda-12.0  cuda-12.3

现在我在 CuDNN 安装的步骤 2.4.3 中进行 CuDNN 验证，它返回了以下输出：

rm -rf *o
rm -rf mnistCUDNN
CUDA_VERSION is 12030
Linking agains cublasLt = true
CUDA VERSION: 12030
TARGET ARCH: x86_64
HOST_ARCH: x86_64
TARGET OS: linux
SMS: 35 50 53 60 61 62 70 72 75 80 86 87
test.c:1:10: fatal error: FreeImage.h: No such file or directory
    1 | #include "FreeImage.h"
      |          ^~~~~~~~~~~~~
compilation terminated.
>>> WARNING - FreeImage is not set up correctly. Please ensure FreeImage is set up correctly. <<<
[@] /usr/local/cuda/bin/nvcc -I/usr/local/cuda/include -I/usr/local/cuda/include -IFreeImage/include -ccbin g++ -m64 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_87,code=sm_87 -gencode arch=compute_87,code=compute_87 -o fp16_dev.o -c fp16_dev.cu
[@] g++ -I/usr/local/cuda/include -I/usr/local/cuda/include -IFreeImage/include -o fp16_emu.o -c fp16_emu.cpp
[@] g++ -I/usr/local/cuda/include -I/usr/local/cuda/include -IFreeImage/include -o mnistCUDNN.o -c mnistCUDNN.cpp
[@] /usr/local/cuda/bin/nvcc -ccbin g++ -m64 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_87,code=sm_87 -gencode arch=compute_87,code=compute_87 -o mnistCUDNN fp16_dev.o fp16_emu.o mnistCUDNN.o -I/usr/local/cuda/include -I/usr/local/cuda/include -IFreeImage/include -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -lcublasLt -LFreeImage/lib/linux/x86_64 -LFreeImage/lib/linux -lcudart -lcublas -lcudnn -lfreeimage -lstdc++ -lm

值得注意的是，它说 CUDA 版本：12030，我假设是 12.3，这很奇怪，因为

$ nvcc --version

表演

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

我的 bashrc 包含以下几行：

export PATH=/usr/local/cuda-11.8/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64:$LD_LIBRARY_PATH

现在我想知道

Keras 抛出的地方

Could not load library libcublasLt.so.10. Error: libcublasLt.so.10: cannot open shared object file: No such file or directory

（这是我尝试重新进行整个设置的最初问题）

Answer 1

我正在使用 ubuntu 20.04，我在使用 libcublasLT 时遇到了类似的问题，我修复了它。

尝试这个，但修改以适应你自己的路径

# Copy header files
  188  sudo cp /home/adesoji/cudnn-linux-x86_64-8.9.6.50_cuda12-archive/include/cudnn*.h /usr/lib/cuda/include/
  189  sudo cp /home/adesoji/cudnn-linux-x86_64-8.9.6.50_cuda12-archive/lib64/libcudnn* /usr/lib/cuda/lib64/
  190  sudo cp /home/adesoji/cudnn-linux-x86_64-8.9.6.50_cuda12-archive/include/cudnn*.h /usr/lib/cuda/include/
  191  sudo cp /home/adesoji/cudnn-linux-x86_64-8.9.6.50_cuda12-archive/lib/libcudnn*   /usr/lib/cuda/lib64/
  192  sudo chmod a+r /usr/lib/cuda/include/cudnn.h /usr/lib/cuda/lib64/libcudnn*

从 nvidia 网站安装 cuda 12.0

https://developer.nvidia.com/cuda-12-0-0-download-archive

从下面的 APT 下载 libcublas 运行

sudo apt install libcublas-12-0

现在转到终端并运行

ls /usr/local | grep cuda  : To know your cuda path
export PATH=/depot/cuda/cuda-12.0/bin:$PATH
export PATH=/depot/cuda/cuda-11.8/bin:$PATH

你应该对这些没问题。我的Python版本和tensorflow版本在下面找到

Python 3.8.10 (default, Nov 22 2023, 10:22:35) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2023-12-05 19:44:58.346827: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
>>> print (tf.__version__)
2.13.1
>>> import keras as ks
>>> print (ks.__version__)
2.13.1
>>>

我的 keras 和 tensorflow 与 cuda 工作得很好

CuDNN 使用错误的 CUDA 版本

问题描述投票：0回答：1

1个回答

最新问题

CuDNN 使用错误的 CUDA 版本

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1