为什么我的DataLoader进程使用了高达2.6GB的虚拟内存,有什么方法可以减少它吗?

问题描述 投票:0回答:1

为什么我的 DataLoader 进程使用高达 2.6GB 的虚拟内存,有什么方法可以减少它吗?
每个DataLoader进程占用2.6GB虚拟内存,4个进程占用10.4GB。

    from transformers import AutoModelForZeroShotImageClassification, AutoProcessor
    ...
    dataset = ImageDataset(image, clip_processor, SCAN_PROCESS_BATCH_SIZE, test_times)
    dataloader = DataLoader(dataset, batch_size=SCAN_PROCESS_BATCH_SIZE, num_workers=4)
from torch.utils.data import Dataset

class ImageDataset(Dataset):
    def __init__(self, image, processor, scan_process_batch_size, test_times):
        self.image = image
        self.processor = processor
        self.scan_process_batch_size = scan_process_batch_size
        self.test_times = test_times

    def __len__(self):
        return self.scan_process_batch_size * self.test_times

    def __getitem__(self, idx):
        # Use processor to process images
        inputs = self.processor(
            images=[self.image],
            return_tensors="pt",
            padding=True
        )['pixel_values']
        return inputs

我通过

memory
软件查看了
process hacker
栏,
private
栏里有大量的32MB行,我保存它,发现这32MB内存中全是0。 enter image description here

我发现代码之后虚拟内存增加了。

        for i in range(self._num_workers):
            # No certainty which module multiprocessing_context is
            index_queue = multiprocessing_context.Queue()  # type: ignore[var-annotated]
            # Need to `cancel_join_thread` here!
            # See sections (2) and (3b) above.
            index_queue.cancel_join_thread()
            w = multiprocessing_context.Process(
                target=_utils.worker._worker_loop,
                args=(self._dataset_kind, self._dataset, index_queue,
                      self._worker_result_queue, self._workers_done_event,
                      self._auto_collation, self._collate_fn, self._drop_last,
                      self._base_seed, self._worker_init_fn, i, self._num_workers,
                      self._persistent_workers, self._shared_seed))
            w.daemon = True
            # NB: Process.start() actually take some time as it needs to
            #     start a process and pass the arguments over via a pipe.
            #     Therefore, we only add a worker to self._workers list after
            #     it started, so that we do not call .join() if program dies
            #     before it starts, and __del__ tries to join but will get:
            #     AssertionError: can only join a started process.
            w.start()
            self._index_queues.append(index_queue)
            self._workers.append(w)
pytorch pytorch-dataloader dataloader
1个回答
0
投票

我希望减少 num_workers 会减少内存使用量。例如

num_workers=1
应该只使用 1/4 的内存。请记住,这也会减少并行化,因为您将使用更少的 CPU 核心/线程。

另一个因素是批量大小。代码片段包含

batch_size=SCAN_PROCESS_BATCH_SIZE
,我不确定这个值的声明是什么,但减少它会减少加载到内存中的图像数量。

如果没有看到其余的代码和输入,就很难找到任何其他减少。需要注意的一件事是输入图像的大小以及是否可以减小(例如通过缩放图像变换)。

© www.soinside.com 2019 - 2024. All rights reserved.