我想通过调整工作人员数量来利用我的 GPU,但工作人员数量 > 0 时出现问题。
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False, num_workers=0)
- 没问题test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False, num_workers=1)
- RuntimeError: CUDA error: initialization error
NVIDIA GeForce RTX 3060 Ti、Torch 2.0.1 + CUDA 1.18 GPU 内存 8191.50 MB
我在 WSL 中的 Visual Studio Code 中工作
这里参考了类似案例。 在colab中使用pytorch num_workers时出错。当 num_workers > 0
有什么建议吗? 谢谢你! >>>编辑<<<< 演示脚本
import tensorflow as tf
# Check GPU availability
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
# Simple model to test GPU functionality
model = tf.keras.Sequential([
tf.keras.layers.Dense(10, activation='relu', input_shape=(784,)),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
# Dummy data
import numpy as np
X_train = np.random.random((1000, 784))
Y_train = np.random.randint(10, size=(1000,))
# Train model
model.fit(X_train, Y_train, epochs=1, workers=1, use_multiprocessing=True)
输出
File "/home/kalin_stoyanov/EntMax_TSNE/my_venv/lib/python3.11/site-packages/keras/src/optimizers/optimizer.py", line 1253, in _internal_apply_gradients
File "/home/kalin_stoyanov/EntMax_TSNE/my_venv/lib/python3.11/site-packages/keras/src/optimizers/optimizer.py", line 1345, in _distributed_apply_gradients_fn
File "/home/kalin_stoyanov/EntMax_TSNE/my_venv/lib/python3.11/site-packages/keras/src/optimizers/optimizer.py", line 1340, in apply_grad_to_update_var
DNN library initialization failed. Look at the errors above for more details.
**System info**
---- CPU Information ----
Physical cores: 12
Logical cores: 24
---- Memory Information ----
Total RAM (GB): 15.442893981933594
Available RAM (GB): 12.711883544921875
---- GPU Information ----
PyTorch GPU Count: 1
PyTorch GPU 0: NVIDIA GeForce RTX 3060 Ti
PyTorch Version: 2.0.1+cu118
TensorFlow GPUs: ['/device:GPU:0']
TensorFlow Version: 2.14.0
Keras Version: 2.14.0
---- Operating System Information ----
Operating System: Linux
OS Version: #1 SMP Fri Mar 29 23:14:13 UTC 2024
Machine: x86_64
---- Python Information ----
Python Version: 3.11.4
Python Executable: /home/kalin_stoyanov/EntMax_TSNE/my_venv/bin/python
PyTorch Version: 2.0.1+cu118
TensorFlow Version: 2.14.0
Keras Version: 2.14.0
---- Num GPUs Available ----
Num GPUs Available (TensorFlow): 1
认为这个问题已经解决了。我真的很累,PyTorch 和 Tensorflow 代码不匹配。对不起!我遇到的问题是 Tensorflow,而不是 PyTorch。如果这里没有解决方案,我将创建包含可重现代码的新问题。