Tensorflow Docker 不使用 GPU

问题描述 投票:0回答:1

我正在尝试让 Tensorflow 在带有 GPU 的 Ubuntu 24.04.1 上运行。

根据本页

Docker 是在 GPU 上运行 TensorFlow 的最简单方法,因为主机只需要 NVIDIA® 驱动程序

所以我正在尝试使用 Docker。

我正在通过运行

docker run --gpus all --rm nvidia/cuda:12.6.2-cudnn-runtime-ubuntu24.04 nvidia-smi
检查以确保我的 GPU 能够与 Docker 配合使用。其输出是:

==========
== CUDA ==
==========

CUDA Version 12.6.2

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

Sat Oct 26 01:16:50 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA TITAN RTX               Off |   00000000:01:00.0 Off |                  N/A |
| 41%   40C    P8             24W /  280W |       1MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

(旁注,我没有使用他们建议的命令,因为

docker run --gpus all --rm nvidia/cuda nvidia-smi
无法工作,因为nvidia/cuda不再有
latest
标签

看来它正在发挥作用。然而,当我跑步时:

docker run --gpus all -it --rm tensorflow/tensorflow:latest-gpu \
   python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

输出为:

2024-10-26 01:20:51.021242: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1729905651.033544       1 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1729905651.037491       1 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-10-26 01:20:51.050486: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
W0000 00:00:1729905652.350499       1 gpu_device.cc:2344] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]

这表明 Tensorflow 没有检测到 GPU。

我在这里做错了什么?

python docker tensorflow gpu
1个回答
0
投票

我不认为你做错了什么,但我担心该图像可能是一个缺少完整图像的“pip install”。

我正在运行不同风格的 Linux,但首先我必须确保我的 GPU 可供 docker 使用(请参阅此处 将 nvidia 运行时添加到 docker 运行时),并且我将我的 cuda 版本升级到最新版本。

即使做了所有这些之后,我也遇到了和你一样的错误。

所以我登录到容器如下:

docker run -it --rm --runtime=nvidia --gpus all tensorflow/tensorflow:latest-gpu /bin/bash

然后跑了

pip install tensorflow[and-cuda]

一些依赖项存在,而一些依赖项必须安装,因为它们丢失了。

完成后我运行了

python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
,它终于找到了我的GPU

您将希望使用他们的 docker 映像作为基础来创建自己的 docker 映像,例如:

# Use the official TensorFlow GPU base image
FROM tensorflow/tensorflow:latest-gpu

# Install TensorFlow with CUDA support
RUN pip install tensorflow[and-cuda]

# To make sure the CMD from the base image is preserved
CMD ["bash"]
© www.soinside.com 2019 - 2024. All rights reserved.