我正在尝试在我的 GPU 上运行 keras。
我的设置:
我通过
sudo ubuntu-drivers install
安装了nvdidia驱动程序。在“软件和更新/附加驱动程序”下,它表示它使用 nvidia-driver535。所以它有一个驱动程序。
然后我通过
sudo apt-get install nvidia-cuda-dev nvidia-cuda-toolkit
安装了cuda工具包。我还通过 sudo apt install nvidia-cudnn
和 tensorflow pip install tensorflow
安装了 cuDNN,其中也已经包含了 keras。
但是,当通过实际的张量流库列出物理设备时,它只列出 CPU。
print(tf.config.list_physical_devices())
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]
导入tensorflow时打印如下:
2024-06-26 23:15:15.129300: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-06-26 23:15:15.131933: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-06-26 23:15:15.170793: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-26 23:15:15.699070: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-06-26 23:15:16.077326: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-06-26 23:15:16.081814: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
似乎没有找到 cuda 驱动程序,并且缺少“TensorRT”。
这是一个全新的 ubuntu 安装,我还没有安装任何其他 python 软件包。
我该怎么做才能让这项工作成功?
经过几次尝试,上面列表中的一个版本组合我可以开始工作。对于任何感兴趣的人,以下是我为运行 TensorFlow GPU 支持所做的确切安装过程:
需要:Ubuntu 20.04 或 Ubuntu 22.04
稍后构建 Colde 的“make”工具:
sudo apt-get install build-essential
https://www.tensorflow.org/install/source#gpu
就我而言:
sudo ubuntu-drivers list
sudo ubuntu-drivers install
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.2.0/local_installers/cuda-repo-ubuntu2204-12-2-local_12.2.0-535.54.03-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-2-local_12.2.0-535.54.03-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-2-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda
https://docs.nvidia.com/deeplearning/cudnn/archives/cudnn-895/install-guide/index.html https://docs.nvidia.com/deeplearning/cudnn/archives/cudnn-895/install-guide/index.html#installlinux-deb
https://developer.nvidia.com/rdp/cudnn-archive
sudo dpkg -i cudnn-local-repo-ubuntu2204-8.9.5.30_1.0-1_amd64.deb
sudo cp /var/cudnn-local-repo-ubuntu2204-8.9.5.30/cudnn-local-FB167084-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get install libcudnn8=8.9.5.30-1+cuda12.2
sudo apt-get install libcudnn8-dev=8.9.5.30-1+cuda12.2
sudo apt-get install libcudnn8-samples=8.9.5.30-1+cuda12.2
文档中的“veryfy install”部分并不像我描述的那样工作,但是 GPU 支持不起作用,所以我不在乎。
pip install tensorflow==2.15.0
Pip install 工作得很好,所以不再需要编译工具 bazel 和 clang 了。
print(tf.config.list_physical_devices(device_type=None))
>>>[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
>>>Num GPUs Available: 1