“运行时错误：CUDA 失败，错误未检测到支持 CUDA 的设备”

Question

我是 docker 新手，如果这是一个愚蠢的问题，请原谅我，但最近，我一直在尝试测试

faster-whisper

，OpenAI 的 Whisper 的重新实现，为了测试这一点，我使用了一个 docker 容器，其中包含来自 docker hub 的图像

nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu22.04

在 ubuntu、wsl2 上。我成功构建了图像，但是当我运行它时，出现此错误：

==========
== CUDA ==
==========

CUDA Version 11.7.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

Traceback (most recent call last):
  File "/home/user/Documents/experiment/./main.py", line 8, in <module>
    model = WhisperModel(model_size, device="cuda", compute_type="float16")
  File "/usr/local/lib/python3.10/dist-packages/faster_whisper/transcribe.py", line 128, in __init__
    self.model = ctranslate2.models.Whisper(
RuntimeError: CUDA failed with error no CUDA-capable device is detected

我用谷歌搜索并尝试了很多可能的解决方案，甚至完全重新安装了 nvidia 驱动程序、nvidia-container-toolkit 和 docker，但错误似乎仍然存在。

这是我用来构建镜像的 Dockerfile：

FROM nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu22.04

RUN apt -y update && apt -y install python3.11 python3-pip

ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES utility,compute

WORKDIR /home/user/Documents/experiment

COPY requirements.txt .

RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD [ "python3", "./main.py" ]

这是Python脚本，我主要是从官方github页面复制粘贴的：

from faster_whisper import WhisperModel
import os
os.environ['CUDA_VISIBLE_DEVICES'] = "0"

model_size = "large-v2"

# Run on GPU with FP16
model = WhisperModel(model_size, device="cuda", compute_type="float16")

# or run on GPU with INT8
# model = WhisperModel(model_size, device="cuda", compute_type="int8_float16")
# or run on CPU with INT8
# model = WhisperModel(model_size, device="cpu", compute_type="int8")

segments, info = model.transcribe("./audio.wav", beam_size=5)

print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

这是我尝试运行容器的命令：

docker run --gpus all --runtime=nvidia -t nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu22.04

这是命令的 nvidia-smi 输出

docker run --gpus all --runtime=nvidia -t nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu22.04 nvidia-smi

:

Sun Oct  8 22:08:24 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.112                Driver Version: 537.42       CUDA Version: 11.7     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3080 Ti     On  | 00000000:01:00.0  On |                  N/A |
|  0%   39C    P8              21W / 400W |    522MiB / 12288MiB |      1%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A        32      G   /Xwayland                                 N/A      |
+---------------------------------------------------------------------------------------+

有什么想法吗？

Answer 1

也许你需要确认你的应用程序环境，指定火炬或其他网络框架，例如你的环境需要 cu11.8 但你的 cuda 版本是 12.1 ，或者你的设备不支持 11.8 ，它会引发此错误。

首先检查您的本地环境和应用程序环境。

“运行时错误：CUDA 失败，错误未检测到支持 CUDA 的设备”

问题描述投票：0回答：1

1个回答

最新问题

“运行时错误：CUDA 失败，错误未检测到支持 CUDA 的设备”

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1