带有 pytorch 数据加载器的五十个数据集

问题描述 投票:0回答:1

我想对来自更大的公共数据集的边界框进行初步训练 - 五十一个看起来是一个不错的起点,但我在让它与 pytorch 一起工作时遇到了一些问题 - 我认为这一定是一些小问题我在这里失踪了,我尝试关注官方github。 我开始怀疑 this github 中的示例已经过时了,因为即使我尝试使用示例类,它也会失败。问题在于访问方法,其中唯一的准数字选项是

dataset.first()
dataset.last()

问题: 数据加载器抛出异常

KeyError: 'Accessing samples by numeric index is not supported. Use sample IDs, filepaths, slices, boolean arrays, or a boolean ViewExpression instead'

我稍微修改了示例,使其仅使用汽车子集:

class FiftyOneDS(torch.utils.data.Dataset):
    def __init__(
                self
                , fiftyone_ds
                , transforms = None
                , gt_field = "ground_truth"
                , classes = None
    ):
        self.samples = fiftyone_ds
        self.transforms = transforms
        self.gt_field = gt_field
        self.classes = classes  # don't care
        self.img_paths = self.samples.values("filepath")
        
    def __getitem__(self, idx):
        img_path = self.img_paths[idx]
        sample = self.samples[idx]
        metadata = sample.metadata
        img = Image.open(img_path).convert("RGB")
        boxes = []
        labels = []
        detections = sample[self.gt_field].detections
        for det in detections:
            if det["label"] != "car":
                continue
            category_id = self.labels_map_rev[det.label]
            coco_obj = fouc.COCOObject.from_label(
                det, metadata, category_id=category_id,
            )
            x, y, w, h = coco_obj.bbox
            boxes.append([x, y, x + w, y + h])
            labels.append(coco_obj.category_id)
        target = {}
        target["boxes"] = torch.as_tensor(boxes, dtype=torch.float32)
        target["labels"] = torch.as_tensor(labels, dtype=torch.int64)
        target["image_id"] = torch.as_tensor([idx])
        if self.transforms is not None:
            img, target = self.transforms(img, target)
        return img, target
    
        
    def __len__(self):
        return len(self.img_paths)

然后通过片段使用数据集(这似乎工作正常):

carset = FiftyOneDS(dataset)
print("type:", type(torch_dataset_test))
# type: <class 'fiftyone.core.view.DatasetView'>
print("first elem:", torch_dataset_test[0])
# KeyError: 'Accessing samples by numeric index is not supported. Use sample IDs, filepaths, slices, boolean arrays, or a boolean ViewExpression instead'

如何重写我的数据集类以与 pytorch 数据加载器一起使用?

pytorch fiftyone
1个回答
0
投票

FiftyOne 不支持索引切片,如错误所示。解决方案可以改为传入文件路径。

sample = self.samples[img_path]

应该可以解决问题。在这里阅读更多内容https://docs.voxel51.com/user_guide/using_views.html#slicing

© www.soinside.com 2019 - 2024. All rights reserved.