我目前正在尝试使用 GPU 资源在 Kaggle 上训练模型,但似乎只使用了一个 GPU,而不是多个。我正在使用以下训练代码:
# Step 1: Install the required packages
#!pip install ultralytics xmltodict albumentations torch torchvision torchaudio
# Step 5: Train the YOLO model
import os
import torch
from ultralytics import YOLO
# Set WANDB_MODE to 'dryrun' to disable WanDB logging
os.environ['WANDB_MODE'] = 'dryrun'
# Set up device for multiple GPUs
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = YOLO('yolov8x.pt') # load a pretrained YOLOv8 model
# Check if multiple GPUs are available
if torch.cuda.device_count() > 1:
print(f"Using {torch.cuda.device_count()} GPUs")
model = torch.nn.DataParallel(model, device_ids=list(range(torch.cuda.device_count()))).to(device)
else:
model = model.to(device)
# Define the training configuration
data_yaml = """
train: /../images/train_combined_data
val: /../images/val
test: /../images/test
nc: 1
names: ['Hotspot']
"""
with open('data.yaml', 'w') as f:
f.write(data_yaml)
# Train the model
model.train(
data='data.yaml',
epochs=50, # Total number of training epochs
batch=16,
imgsz=640, # Target image size for training
device='cuda'
)
我检查了 Kaggle 的文档,它应该支持多个 GPU 进行训练。我是否需要在代码中添加一些特定内容才能启用多 GPU 训练,或者 Kaggle 上是否有我可能错过的设置?
有关此问题的任何帮助或指导将不胜感激。谢谢!
如何同时使用两个 GPU?
嗨,我对此不太了解,但当我使用 t4 gpu 时,我面临同样的问题,只有一个显示用法,所以我使用 p100 gpu,它有我认为更大的 v-ram,谢谢