CUDA_ERROR_INVALID_SOURCE:设备内核映像无效

问题描述 投票:0回答:1

我正在尝试在我的 docker 容器中使用 cupy。 我使用的容器其中一个用于 CUDA 和 cuDNN,另一个用于 cupy。

我尝试过这段代码。

import cupy as cp

cupy_array = cp.array([1, 2, 3])
cupy_result = cupy_array + 5 
print("CuPy Result:", cupy_result)

完整的错误日志就像

Traceback (most recent call last):

  File "<stdin>", line 1, in <module>
  File "cupy/_core/core.pyx", line 1191, in cupy._core.core.ndarray.__add__
  File "cupy/_core/core.pyx", line 1591, in cupy._core.core.ndarray.__array_ufunc__
  File "cupy/_core/_kernel.pyx", line 1292, in cupy._core._kernel.ufunc.__call__
  File "cupy/_core/_kernel.pyx", line 1319, in cupy._core._kernel.ufunc._get_ufunc_kernel
  File "cupy/_core/_kernel.pyx", line 1025, in cupy._core._kernel._get_ufunc_kernel
  File "cupy/_core/_kernel.pyx", line 72, in cupy._core._kernel._get_simple_elementwise_kernel
  File "cupy/_core/core.pyx", line 2141, in cupy._core.core.compile_with_cache
  File "/usr/local/lib/python3.8/dist-packages/cupy/cuda/compiler.py", line 492, in _compile_module_with_cache
    return _compile_with_cache_cuda(
  File "/usr/local/lib/python3.8/dist-packages/cupy/cuda/compiler.py", line 614, in _compile_with_cache_cuda
mod.load(cubin)
  File "cupy/cuda/function.pyx", line 264, in cupy.cuda.function.Module.load
  File "cupy/cuda/function.pyx", line 266, in cupy.cuda.function.Module.load
  File "cupy_backends/cuda/api/driver.pyx", line 210, in cupy_backends.cuda.api.driver.moduleLoadData
  File "cupy_backends/cuda/api/driver.pyx", line 60, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_INVALID_SOURCE: device kernel image is invalid

nvidia-smi的结果

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4080        Off | 00000000:01:00.0  On |                  N/A |
|  0%   32C    P8               6W / 320W |    483MiB / 16376MiB |      4%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                     
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

nvcc -V 的结果

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_Mar__8_18:18:20_PST_2022
Cuda compilation tools, release 11.6, V11.6.124
Build cuda_11.6.r11.6/compiler.31057947_0

结果

cat /usr/include/cudnn_version.h | grep CUDNN_MAJOR -A 2

#define CUDNN_MAJOR 8
#define CUDNN_MINOR 4
#define CUDNN_PATCHLEVEL 0
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

#endif /* CUDNN_VERSION_H */

pip3 freeze | grep cupy
的结果是cupy-cuda116==10.6.0

以上结果均显示在cupy的docker容器中。

我为 CUDA 和 cuDNN 运行 docker

sudo docker run --name cuda11.6.1-cudnn8 --gpus all --runtime=nvidia -it   \ --privileged --env="DISPLAY=:0:0" -v=/tmp/.X11-unix:/tmp/.X11-unix:ro   \ -v=/home/youngjoo/Documents/Elevation_ws:/home/youngjoo/Documents/Elevation_ws \ -v=/dev:/dev -w=/home/youngjoo/Documents/Elevation_ws  \ nvidia/cuda:11.6.1-cudnn8-devel-ubuntu20.04

我的操作系统是 Ubuntu 20.04。

Docker版本是24.0.7,构建afdd53b。

我该如何解决这个问题?

我删除了所有docker容器并重新启动,但结果是一样的。

python docker cudnn cupy
1个回答
0
投票

我也遇到同样的问题,请问你解决了吗?

© www.soinside.com 2019 - 2024. All rights reserved.