读取.h5格式的文件并在数据集中使用它

Question

我有两个文件夹（一个用于训练，一个用于测试），每个文件夹都有大约 10 个 h5 格式的文件。我想阅读它们并在数据集中使用它们。我有一个函数可以读取它们，但我不知道如何使用它来读取我的类中的文件。

def read_h5(path):
    data = h5py.File(path, 'r')
    image = data['image'][:]
    label = data['label'][:]
    return image, label

class Myclass(Dataset):
    def __init__(self, split='train', transform=None):
        raise NotImplementedError

    def __len__(self):
        raise NotImplementedError

    def __getitem__(self, index):
        raise NotImplementedError

你有什么建议吗？先谢谢你了

Answer 1

这可能是您想做的事情的开始。我实现了

__init__()

，但没有实现

__len__()

或

__get_item__()

。用户提供路径，init函数调用类方法

read_h5()

来获取图像和标签数据的数组。有一个简短的 main 从 2 个不同的 H5 文件创建类对象。修改

paths

列表，其中包含所有训练和测试数据的文件夹和文件名。

class H5_data():
    def __init__(self, path): #split='train', transform=None):
        self.path = path
        self.image, self.label = H5_data.read_h5(path)

    @classmethod
    def read_h5(cls,path):
        with h5py.File(path, 'r') as data:
            image = data['image'][()]
            label = data['label'][()]
            return image, label
        
paths = ['train_0.h5', 'test_0.h5']
for path in paths:
    h5_test = H5_data(path)
    print(f'For HDF5 file: {path}')
    print(f'image data, shape: {h5_test.image.shape}; dtype: {h5_test.image.dtype}')
    print(f'label data, shape: {h5_test.label.shape}; dtype: {h5_test.label.dtype}')

恕我直言，用数组数据创建一个类是多余的（如果你有非常大的数据集，可能会导致内存问题）。创建 h5py 数据集对象并在需要时访问数据可以提高内存效率。下面的示例与上面的代码相同，但没有使用 numpy 数组创建类对象。

paths = ['train_0.h5', 'test_0.h5']
for path in paths:
    with h5py.File(path, 'r') as data:
        image = data['image']
        label = data['label']               
        print(f'For HDF5 file: {path}')
        print(f'image data, shape: {image.shape}; dtype: {image.dtype}')
        print(f'label data, shape: {label.shape}; dtype: {label.dtype}')

读取.h5格式的文件并在数据集中使用它

问题描述投票：0回答：1

1个回答

最新问题

读取.h5格式的文件并在数据集中使用它

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1