带有 CUDA 的 Docker 容器看不到我的 GPU | WSL2 / Ubuntu / Win10 | nvcc 和 nvidia-smi 工作

问题描述 投票:0回答:1

由于某种原因,任何带有 CUDA 的 docker 容器都看不到我的 GPU。

当我运行这个时:

docker run --gpus=all --rm nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
我有这个输出:

...
Error: only 0 Devices available, 1 requested.  Exiting.

CUDA 容器无法找到我的 GPU。

我在论坛中发现了很多类似的问题,但没有令人满意的答案。 你们中有人找到了 WSl2 / Docker Desktop / Win10 / Ubuntu20.04 发生这种情况的原因吗? 我有最新版本的 CUDA 和 NVIDIA 驱动程序以及最新版本的 WSL2 和 Docker-Desktop。

但是 nvidia-smi 和 nvcc --version 都可以工作

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Apr_17_19:19:55_PDT_2024
Cuda compilation tools, release 12.5, V12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0
$ nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.03              Driver Version: 555.85         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4060 Ti     On  |   00000000:01:00.0  On |                  N/A |
|  0%   53C    P8             16W /  165W |    1045MiB /  16380MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        21      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        23      G   /Xwayland                                   N/A      |
+-----------------------------------------------------------------------------------------+

这也有效 -> 看起来纯粹与 CUDA 相关。

/mnt/c/Users/pavel$ docker run --rm  --gpus=all ubuntu nvidia-smi -L
GPU 0: NVIDIA GeForce RTX 4060 Ti (UUID: GPU-28820d91-b332-b4ba-f1c8-5508048ce1f7)

我的环境:

wsl --version
Verze WSL: 2.1.5.0
Verze jádra: 5.15.146.1-2
Verze WSLg: 1.0.60
Verze MSRDC: 1.2.5105
Verze Direct3D: 1.611.1-81528511
Verze DXCore: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Verze Windows: 10.0.19045.4412

docker info

Client:
 Version:    26.1.1
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.14.0-desktop.1
    Path:     C:\Program Files\Docker\cli-plugins\docker-buildx.exe
  compose: Docker Compose (Docker Inc.)
    Version:  v2.27.0-desktop.2
    Path:     C:\Program Files\Docker\cli-plugins\docker-compose.exe
  debug: Get a shell into any image or container (Docker Inc.)
    Version:  0.0.29
    Path:     C:\Program Files\Docker\cli-plugins\docker-debug.exe
  dev: Docker Dev Environments (Docker Inc.)
    Version:  v0.1.2
    Path:     C:\Program Files\Docker\cli-plugins\docker-dev.exe
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.23
    Path:     C:\Program Files\Docker\cli-plugins\docker-extension.exe
  feedback: Provide feedback, right in your terminal! (Docker Inc.)
    Version:  v1.0.4
    Path:     C:\Program Files\Docker\cli-plugins\docker-feedback.exe
  init: Creates Docker-related starter files for your project (Docker Inc.)
    Version:  v1.1.0
    Path:     C:\Program Files\Docker\cli-plugins\docker-init.exe
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc.)
    Version:  0.6.0
    Path:     C:\Program Files\Docker\cli-plugins\docker-sbom.exe
  scout: Docker Scout (Docker Inc.)
    Version:  v1.8.0
    Path:     C:\Program Files\Docker\cli-plugins\docker-scout.exe

Server:
 Containers: 11
  Running: 6
  Paused: 0
  Stopped: 5
 Images: 47
 Server Version: 26.1.1
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 nvidia runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: e377cd56a71523140ca6ae87e30244719194a521
 runc version: v1.1.12-0-g51d5e94
 init version: de40ad0
 Security Options:
  seccomp
   Profile: unconfined
 Kernel Version: 5.15.146.1-microsoft-standard-WSL2
 Operating System: Docker Desktop
 OSType: linux
 Architecture: x86_64
 CPUs: 5
 Total Memory: 39.18GiB
 Name: docker-desktop
 ID: 88425de8-c396-4a90-9fea-afb64822deaa
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 No Proxy: hubproxy.docker.internal
 Labels:
  com.docker.desktop.address=npipe://\\.\pipe\docker_cli
 Experimental: false
 Insecure Registries:
  hubproxy.docker.internal:5555
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No blkio throttle.read_bps_device support
WARNING: No blkio throttle.write_bps_device support
WARNING: No blkio throttle.read_iops_device support
WARNING: No blkio throttle.write_iops_device support
WARNING: daemon is not using the default seccomp profile
 nvidia-smi
Fri May 31 08:47:59 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.85                 Driver Version: 555.85         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4060 Ti   WDDM  |   00000000:01:00.0  On |                  N/A |
|  0%   54C    P8             16W /  165W |    1083MiB /  16380MiB |      2%      Default |
|                                         |                        |                  N/A |

Ubuntu 20.04

我已经尝试过多次安装和全新安装 NVIDIA 驱动程序、CUDA Toolkit、NVIDIA Container Toolkit 等。

据我发现,人们必须在 Win10/WSL2 环境中安装哪些内容才能使 CUDA 正常工作,这之间存在很大差异。有些安装了最新的 NVIDIA 驱动程序。有些同时安装 CUDA Win10 Toolkit 和 CUDA WSL-Ubuntu Toolkit。此外,有些人必须安装 Nvidia Container Toolkit,有些则不需要。

我陷入了尝试所有可能的安装组合的无限循环中,但似乎我错过了一些东西。

有人遇到同样的情况并找到解决方案吗?谢谢!!

docker cuda gpu nvidia wsl-2
1个回答
0
投票

如果有人在使用驱动程序版本 555.85 (CUDA 12.5) 时遇到此问题,解决方案是降级到 552.22 (CUDA 12.4) (https://www.nvidia.com/download/driverResults.aspx/224154/en-us/

如果您只需要运行 CUDA 容器 步骤 - 从 wsl 中删除所有 nvidia 和 cuda 包 在 Windows 中卸载 CUDA 工具包。

下载内含12.4 CUDA的552.22驱动程序。

运行全新安装。

重启

docker run --gpus all --rm nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark

现在应该可以工作了:)

最新问题
© www.soinside.com 2019 - 2025. All rights reserved.