无法从 Pytorch 数据集的 __get_item__ 返回布尔变量

问题描述 投票:0回答:1

我有一个 pytorch

Dataset
子类,我用它创建了一个 pytorch
DataLoader
。当我从 DataSet 的
__getitem__()
方法返回两个张量时,它就起作用了。我尝试创建最小的(但不起作用,稍后会详细介绍)代码如下:

import torch
from torch.utils.data import Dataset
import random

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

class DummyDataset(Dataset):
    def __init__(self, num_samples=3908, window=10): # same default values as in the original code
        self.window = window
        # Create dummy data
        self.x = torch.randn(num_samples, 10, dtype=torch.float32, device='cpu')  
        self.y = torch.randn(num_samples, 3, dtype=torch.float32, device='cpu')
        self.t = {i: random.choice([True, False]) for i in range(num_samples)}

    def __len__(self):
        return len(self.x) - self.window + 1

    def __getitem__(self, i):
        return self.x[i: i + self.window], self.y[i + self.window - 1] #, self.t[i]

ds = DummyDataset()
dl = torch.utils.data.DataLoader(ds, batch_size=10, shuffle=False, generator=torch.Generator(device='cuda'), num_workers=4, prefetch_factor=16)

for data in dl:
    x = data[0]
    y = data[1]
    # t = data[2]
    print(f"x: {x.shape}, y: {y.shape}") # , t: {t}
    break   

上面的代码给出以下错误:

RuntimeError: Expected a 'cpu' device type for generator but found 'cuda'

上线

for data in dl:

但我的原始代码与上面完全相同:数据集包含在

cpu
上创建的张量,并且数据加载器的生成器设备设置为
cuda
并且它可以工作(我的意思是上面的最小代码不起作用,但我原始代码中的相同行确实有效!)。

当我尝试通过从

, self.t[i]
方法取消注释
__get_item__()
来从中返回布尔值时,它给了我以下错误:

Traceback (most recent call last):
  File "/my_project/src/train.py", line 66, in <module>
    trainer.train_validate()
  File "/my_project/src/trainer_cpu.py", line 146, in train_validate
    self.train()
  File "/my_project/src/trainer_cpu.py", line 296, in train
    for train_data in tqdm(self.train_dataloader, desc=">> train", mininterval=5):
  File "/usr/local/lib/python3.9/site-packages/tqdm/std.py", line 1181, in __iter__
    for obj in iterable:
  File "/usr/local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 630, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1344, in _next_data
    return self._process_data(data)
  File "/usr/local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1370, in _process_data
    data.reraise()
  File "/usr/local/lib/python3.9/site-packages/torch/_utils.py", line 706, in reraise
    raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 309, in _worker_loop
    data = fetcher.fetch(index)  # type: ignore[possibly-undefined]
  File "/usr/local/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 55, in fetch
    return self.collate_fn(data)
  File "/usr/local/lib/python3.9/site-packages/torch/utils/data/_utils/collate.py", line 317, in default_collate
    return collate(batch, collate_fn_map=default_collate_fn_map)
  File "/usr/local/lib/python3.9/site-packages/torch/utils/data/_utils/collate.py", line 174, in collate
    return [collate(samples, collate_fn_map=collate_fn_map) for samples in transposed]  # Backwards compatibility.
  File "/usr/local/lib/python3.9/site-packages/torch/utils/data/_utils/collate.py", line 174, in <listcomp>
    return [collate(samples, collate_fn_map=collate_fn_map) for samples in transposed]  # Backwards compatibility.
  File "/usr/local/lib/python3.9/site-packages/torch/utils/data/_utils/collate.py", line 146, in collate
    return collate_fn_map[collate_type](batch, collate_fn_map=collate_fn_map)
  File "/usr/local/lib/python3.9/site-packages/torch/utils/data/_utils/collate.py", line 235, in collate_int_fn
    return torch.tensor(batch)
  File "/usr/local/lib/python3.9/site-packages/torch/utils/_device.py", line 79, in __torch_function__
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/torch/cuda/__init__.py", line 300, in _lazy_init
    raise RuntimeError(
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

为什么会这样?为什么它不允许我从

__get_item__
返回额外的布尔值?

PS:

以上是主要问题。然而,我注意到一些奇怪的观察结果:如果我将

, self.t[i]
的发电机设备从
DalaLoader
替换为
cuda
,上面的代码(带或不带
cpu
注释)就会开始工作!也就是说,如果我用
generator=torch.Generator(device='cuda')
替换
generator=torch.Generator(device='cpu')
,它会输出:

x: torch.Size([10, 10, 10]), y: torch.Size([10, 3])

如果我在原始代码中执行相同的操作,则会出现以下错误:

RuntimeError: Expected a 'cuda' device type for generator but found 'cpu'

上线

for data in dl:

更新

当我将

self.t
的类型从 python
dict
更改为 bool 类型的火炬张量并将其移动到 cpu 时,它就开始工作了:

self.t = torch.tensor([random.choice([True, False]) for _ in range(num_samples)], dtype=torch.bool).to('cpu')

请解释原因。

python python-3.x machine-learning pytorch
1个回答
0
投票

使用

torch.Generator(device='cpu')

您不应该在数据加载器中执行任何与 cuda 相关的操作,尤其是当它运行多个工作进程时。从数据加载器中提取一批数据,并在整理输出后将其移至 cuda。

数据加载器的生成器设置用于采样的 RNG 状态。它需要一个 CPU 张量。这就是你收到错误的原因

RuntimeError: Expected a 'cpu' device type for generator but found 'cuda'

Cuda 生成器用于在 cuda 进程内的 GPU 上生成随机数 - 它们不应用于数据加载器。

Cannot re-initialize CUDA in forked subprocess
是由于尝试在分叉进程内执行cuda操作而引起的。没有原始代码很难说,但这可能是由 getitem 返回 cuda 张量引起的。

© www.soinside.com 2019 - 2024. All rights reserved.