我正在尝试在我的 docker 容器中使用 cupy。 我使用的容器其中一个用于 CUDA 和 cuDNN,另一个用于 cupy。
我尝试过这段代码。
import cupy as cp
cupy_array = cp.array([1, 2, 3])
cupy_result = cupy_array + 5
print("CuPy Result:", cupy_result)
完整的错误日志就像
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "cupy/_core/core.pyx", line 1191, in cupy._core.core.ndarray.__add__
File "cupy/_core/core.pyx", line 1591, in cupy._core.core.ndarray.__array_ufunc__
File "cupy/_core/_kernel.pyx", line 1292, in cupy._core._kernel.ufunc.__call__
File "cupy/_core/_kernel.pyx", line 1319, in cupy._core._kernel.ufunc._get_ufunc_kernel
File "cupy/_core/_kernel.pyx", line 1025, in cupy._core._kernel._get_ufunc_kernel
File "cupy/_core/_kernel.pyx", line 72, in cupy._core._kernel._get_simple_elementwise_kernel
File "cupy/_core/core.pyx", line 2141, in cupy._core.core.compile_with_cache
File "/usr/local/lib/python3.8/dist-packages/cupy/cuda/compiler.py", line 492, in _compile_module_with_cache
return _compile_with_cache_cuda(
File "/usr/local/lib/python3.8/dist-packages/cupy/cuda/compiler.py", line 614, in _compile_with_cache_cuda
mod.load(cubin)
File "cupy/cuda/function.pyx", line 264, in cupy.cuda.function.Module.load
File "cupy/cuda/function.pyx", line 266, in cupy.cuda.function.Module.load
File "cupy_backends/cuda/api/driver.pyx", line 210, in cupy_backends.cuda.api.driver.moduleLoadData
File "cupy_backends/cuda/api/driver.pyx", line 60, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_INVALID_SOURCE: device kernel image is invalid
nvidia-smi的结果
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 4080 Off | 00000000:01:00.0 On | N/A |
| 0% 32C P8 6W / 320W | 483MiB / 16376MiB | 4% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
+---------------------------------------------------------------------------------------+
nvcc -V 的结果
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_Mar__8_18:18:20_PST_2022
Cuda compilation tools, release 11.6, V11.6.124
Build cuda_11.6.r11.6/compiler.31057947_0
结果
cat /usr/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 8
#define CUDNN_MINOR 4
#define CUDNN_PATCHLEVEL 0
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#endif /* CUDNN_VERSION_H */
pip3 freeze | grep cupy
的结果是cupy-cuda116==10.6.0
以上结果均显示在cupy的docker容器中。
我为 CUDA 和 cuDNN 运行 docker
sudo docker run --name cuda11.6.1-cudnn8 --gpus all --runtime=nvidia -it \ --privileged --env="DISPLAY=:0:0" -v=/tmp/.X11-unix:/tmp/.X11-unix:ro \ -v=/home/youngjoo/Documents/Elevation_ws:/home/youngjoo/Documents/Elevation_ws \ -v=/dev:/dev -w=/home/youngjoo/Documents/Elevation_ws \ nvidia/cuda:11.6.1-cudnn8-devel-ubuntu20.04
我的操作系统是 Ubuntu 20.04。
Docker版本是24.0.7,构建afdd53b。
我该如何解决这个问题?
我删除了所有docker容器并重新启动,但结果是一样的。
我也遇到同样的问题,请问你解决了吗?