Tensorflow GPU 支持在 ubuntu 上不起作用

问题描述 投票:0回答:1

我正在尝试在我的 GPU 上运行 keras。

我的设置:

  • NVIDIA Geforce RTX 3070
  • Ubuntu 22.04
  • Python:3.10

我通过

sudo ubuntu-drivers install
安装了nvdidia驱动程序。在“软件和更新/附加驱动程序”下,它表示它使用 nvidia-driver535。所以它有一个驱动程序。

然后我通过

sudo apt-get install nvidia-cuda-dev  nvidia-cuda-toolkit
安装了cuda工具包。我还通过
sudo apt install nvidia-cudnn
和 tensorflow
pip install tensorflow
安装了 cuDNN,其中也已经包含了 keras。

但是,当通过实际的张量流库列出物理设备时,它只列出 CPU。

print(tf.config.list_physical_devices())
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]

导入tensorflow时打印如下:

2024-06-26 23:15:15.129300: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-06-26 23:15:15.131933: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-06-26 23:15:15.170793: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-26 23:15:15.699070: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-06-26 23:15:16.077326: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-06-26 23:15:16.081814: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...

似乎没有找到 cuda 驱动程序,并且缺少“TensorRT”。

这是一个全新的 ubuntu 安装,我还没有安装任何其他 python 软件包。

我该怎么做才能让这项工作成功?

python tensorflow keras gpu nvidia
1个回答
0
投票

经过几次尝试,上面列表中的一个版本组合我可以开始工作。对于任何感兴趣的人,以下是我为运行 TensorFlow GPU 支持所做的确切安装过程:

先决条件:

需要:Ubuntu 20.04 或 Ubuntu 22.04

稍后构建 Colde 的“make”工具:

sudo apt-get install build-essential

所需的软件包版本

https://www.tensorflow.org/install/source#gpu

就我而言:

  • 张量流2.15.0
  • Python 3.9-3.11
  • 铿锵16.0.0
  • 巴泽尔6.1.0
  • cuDNN 8.9
  • CUDA 12.2

第 1 步:NVIDIA 驱动程序

sudo ubuntu-drivers list
sudo ubuntu-drivers install

STEP2:重启显卡驱动即可生效

第三步:CUDA 12.2

https://developer.nvidia.com/cuda-12-2-0-download-archive?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_local

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.2.0/local_installers/cuda-repo-ubuntu2204-12-2-local_12.2.0-535.54.03-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-2-local_12.2.0-535.54.03-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-2-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda

第 4 步:CUDA 12.x 的 cuDNN 8.9.5

https://docs.nvidia.com/deeplearning/cudnn/archives/cudnn-895/install-guide/index.html https://docs.nvidia.com/deeplearning/cudnn/archives/cudnn-895/install-guide/index.html#installlinux-deb

下载:

https://developer.nvidia.com/rdp/cudnn-archive

  • “下载 cuDNN v8.9.5(2023 年 10 月 27 日),适用于 CUDA 12.x”
  • “Ubuntu22.04 x86_64 (Deb) 的本地安装程序”
安装:
sudo dpkg -i cudnn-local-repo-ubuntu2204-8.9.5.30_1.0-1_amd64.deb
sudo cp /var/cudnn-local-repo-ubuntu2204-8.9.5.30/cudnn-local-FB167084-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get install libcudnn8=8.9.5.30-1+cuda12.2
sudo apt-get install libcudnn8-dev=8.9.5.30-1+cuda12.2
sudo apt-get install libcudnn8-samples=8.9.5.30-1+cuda12.2

文档中的“veryfy install”部分并不像我描述的那样工作,但是 GPU 支持不起作用,所以我不在乎。

第五步:张量流

pip install tensorflow==2.15.0

Pip install 工作得很好,所以不再需要编译工具 bazel 和 clang 了。

第 6 步:验证 python 中的 GPU 支持

print(tf.config.list_physical_devices(device_type=None))
>>>[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
>>>Num GPUs Available: 1

完成

© www.soinside.com 2019 - 2024. All rights reserved.