我正在使用 SuperGradients 库在 python 中训练 YOLO 模型。我已经为训练和验证数据集创建了 DataLoader 对象,但是当我尝试将它们传递给 trainer.train() 方法时,出现以下错误:
日志摘要:
TypeError: 'DataLoader' object is not subscriptable
完整日志跟踪:
[2024-08-27 07:35:44] WARNING - sg_trainer.py - Train dataset size % batch_size != 0 and drop_last=False, this might result in smaller last batch.
The console stream is now moved to /content/drive/MyDrive/MerchanMe/IATraining/Dados/ModelosProdutos/checkpoints/yolo_nas_version_m/console_Aug27_07_35_44.txt
[2024-08-27 07:35:46] INFO - sg_trainer.py - Using EMA with params {'decay': 0.9, 'decay_type': 'threshold'}
An error occurred during training: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/worker.py", line 309, in _worker_loop
data = fetcher.fetch(index) # type: ignore[possibly-undefined]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 52, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
TypeError: 'DataLoader' object is not subscriptable
Traceback:
Traceback (most recent call last):
File "<ipython-input-17-a2a5c064ba0b>", line 5, in <cell line: 4>
trainer.train(
File "/usr/local/lib/python3.10/dist-packages/super_gradients/training/sg_trainer/sg_trainer.py", line 1323, in train
first_batch = next(iter(self.train_loader))
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 630, in __next__
data = self._next_data()
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1344, in _next_data
return self._process_data(data)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1370, in _process_data
data.reraise()
File "/usr/local/lib/python3.10/dist-packages/torch/_utils.py", line 706, in reraise
raise exception
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/worker.py", line 309, in _worker_loop
data = fetcher.fetch(index) # type: ignore[possibly-undefined]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 52, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
TypeError: 'DataLoader' object is not subscriptable
我创建数据加载器的代码:
from super_gradients.training.dataloaders.dataloaders import coco_detection_yolo_format_train, coco_detection_yolo_format_val
from torch.utils.data import ConcatDataset, DataLoader
# List of dataset folders containing COCO datasets
yolo_folders = [f'{LOCATION}/dataset1', f'{LOCATION}/dataset2', f'{LOCATION}/dataset3', f'{LOCATION}/dataset4']
# Load YOLO TRAIN datasets from each folder
train_datasets = []
for folder in yolo_folders:
dataset = coco_detection_yolo_format_train(
dataset_params={
'data_dir': folder,
'images_dir': f'{folder}/train/images',
'labels_dir': f'{folder}/train/labels',
'classes': dataset_params['classes'],
'input_dim': (640, 640)
},
dataloader_params={
'batch_size': BATCH_SIZE,
'num_workers': 2
}
)
train_datasets.append(dataset)
# Combine the training datasets
combined_train_dataset = ConcatDataset(train_datasets)
# Create a DataLoader for the combined training dataset
train_dataloader = DataLoader(combined_train_dataset, batch_size=16, shuffle=True, num_workers=4)
我创建模型并调用 Trainer.train() 的代码
import torch
from super_gradients.training import models
from super_gradients.training.losses import PPYoloELoss
from super_gradients.training.metrics import DetectionMetrics_050
from super_gradients.training.models.detection_models.pp_yolo_e import PPYoloEPostPredictionCallback
from super_gradients.training import Trainer
model = models.get(MODEL_ARCH, num_classes=len(dataset_params['classes']), pretrained_weights="coco").to(DEVICE)
train_params = {
# ENABLING SILENT MODE
'silent_mode': False,
"average_best_models":True,
"warmup_mode": "linear_epoch_step",
"warmup_initial_lr": 1e-6,
"lr_warmup_epochs": 3,
"initial_lr": 5e-4,
"lr_mode": "cosine",
"cosine_final_lr_ratio": 0.1,
"optimizer": "Adam",
"optimizer_params": {"weight_decay": 0.0001},
"zero_weight_decay_on_bias_and_bn": True,
"ema": True,
"ema_params": {"decay": 0.9, "decay_type": "threshold"},
"max_epochs": 20,
"mixed_precision": False, #Set to True if using GPU to speed up training
"loss": PPYoloELoss(
use_static_assigner=False,
num_classes=len(dataset_params['classes']),
reg_max=16
),
"valid_metrics_list": [
DetectionMetrics_050(
score_thres=0.1,
top_k_predictions=300,
# NOTE: num_classes needs to be defined here
num_cls=len(dataset_params['classes']),
normalize_targets=True,
post_prediction_callback=PPYoloEPostPredictionCallback(
score_threshold=0.01,
nms_top_k=1000,
max_predictions=300,
nms_threshold=0.7
)
)
],
"metric_to_watch": '[email protected]'
}
trainer = Trainer(experiment_name='yolo_nas_version_m', ckpt_root_dir=CHECKPOINT_DIR)
trainer.train(
model=model,
training_params=train_params,
train_loader=train_dataloader,
valid_loader=val_dataloader
)
如果我按照下面的方法操作,它就可以工作,但只需直接使用单个数据集:
trainer.train(
model=model,
training_params=train_params,
train_loader=train_datasets[0],
valid_loader=val_datasets[0]
)
请给我一些建议,我已经尝试解决这个问题一个星期了。我已经尝试了很多解决方案。
P.S:我知道一个简单的修复方法是,我可以将其合并为 1,然后将 trainer.train() 与单个数据集而不是组合数据加载器一起使用,而不是拥有多个数据集文件夹。 但解决方案正在不断增长,我需要拆分这些数据集,以防我想使用其中几个数据集或其他数据集进行测试。
培训师期望将 DataLoader 包裹在 Dataset 周围。
如果您只想训练一个子集,有多种可能的方法,从编写自定义数据集类或使用仅返回选定数据集范围内的索引的自定义Sampler。
最快的方法是创建您要使用的数据集的 DataLoader:
trainer.train(
model=model,
training_params=train_params,
train_loader=DataLoader(train_datasets[0], batch_size=16, shuffle=True, num_workers=4)
valid_loader=DataLoader(train_datasets[0], batch_size=16, shuffle=False, num_workers=4)
)