我有一个 torch.utils.data.DataLoader。我用以下代码创建了它们。
transform_train = transforms.Compose([
transforms.RandomCrop(32, padding=4),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])
trainset = CIFAR100WithIdx(root='.',
train=True,
download=True,
transform=transform_train,
rand_fraction=args.rand_fraction)
train_loader = torch.utils.data.DataLoader(trainset,
batch_size=args.batch_size,
shuffle=True,
num_workers=args.workers)
但是当我运行以下代码时出现错误。
train_loader_2 = []
for i, (inputs, target, index_dataset) in enumerate(train_loader):
train_loader_2.append((inputs, target, index_dataset))
错误是
Traceback (most recent call last):
File "main_superloss.py", line 460, in <module>
main()
File "main_superloss.py", line 456, in main
main_worker(args)
File "main_superloss.py", line 374, in main_worker
train_loader, val_loader = get_train_and_val_loader(args)
File "main_superloss.py", line 120, in get_train_and_val_loader
for i, (inputs, target, index_dataset) in enumerate(train_loader):
File "/home/C00423766/.conda/envs/dp/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 804, in __next__
idx, data = self._get_data()
File "/home/C00423766/.conda/envs/dp/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 771, in _get_data
success, data = self._try_get_data()
File "/home/C00423766/.conda/envs/dp/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 724, in _try_get_data
data = self.data_queue.get(timeout=timeout)
File "/home/C00423766/.conda/envs/dp/lib/python3.7/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
File "/home/C00423766/.conda/envs/dp/lib/python3.7/site-packages/torch/multiprocessing/reductions.py", line 284, in rebuild_storage_fd
fd = df.detach()
File "/home/C00423766/.conda/envs/dp/lib/python3.7/multiprocessing/resource_sharer.py", line 58, in detach
return reduction.recv_handle(conn)
File "/home/C00423766/.conda/envs/dp/lib/python3.7/multiprocessing/reduction.py", line 185, in recv_handle
return recvfds(s, 1)[0]
File "/home/C00423766/.conda/envs/dp/lib/python3.7/multiprocessing/reduction.py", line 161, in recvfds
len(ancdata))
RuntimeError: received 0 items of ancdata
我想获取列表中的数据的原因是因为我想对样本重新排序。而且不是以随机的方式,而是以特定的方式。我怎样才能做到这一点?
import resource
rlimit = resource.getrlimit(resource.RLIMIT_NOFILE)
resource.setrlimit(resource.RLIMIT_NOFILE, (2048, rlimit[1]))
我能够修复它
sudo vim /etc/security/limits.conf
# TODO add `* soft nofile 4096` to the end of the file without a `#`.
sudo vim /etc/pam.d/common-session
# TODO add `session required pam_limits.so` to the end of the file without a `#`
# Log out and log back in and you should be good. Check with ulimit -n
# NOTE: also need to restart ssh/any screen sessions as they remember the fd limit.
在 Ubuntu 上,您需要执行以下操作来为所有用户解决此问题:
添加行
session required pam_limits.so
到
common-session*
文件(有多个!)
$ sudo nano /etc/pam.d/common-session
$ sudo nano /etc/pam.d/common-session-noninteractive
然后添加行
* soft nofile 4096
* hard nofile 4096
到
limits.conf
文件
$ sudo nano /etc/security/limits.conf
重新登录后您应该会看到
$ ulimit -a
...
open files (-n) 4096
...
这应该可以永远解决你的 Ubuntu 机器上的问题。
看到
如果由于某种原因(例如不是管理员),您无法使用
ulimit
更改文件描述符的限制,您也可以使用
import torch.multiprocessing
torch.multiprocessing.set_sharing_strategy('file_system')
导入火炬后(在做其他事情之前)
(来源此 pytorch 问题评论)