我正在尝试在 PyTorch 中实现迁移学习方法。这是我正在使用的数据集:Dog-Breed
这是我正在遵循的步骤。
1. Load the data and read csv using pandas.
2. Resize (60, 60) the train images and store them as numpy array.
3. Apply stratification and split the train data into 7:1:2 (train:validation:test)
4. use the resnet18 model and train.
数据集位置
LABELS_LOCATION = './dataset/labels.csv'
TRAIN_LOCATION = './dataset/train/'
TEST_LOCATION = './dataset/test/'
ROOT_PATH = './dataset/'
读取 CSV(labels.csv)
def read_csv(csvf):
# print(pandas.read_csv(csvf).values)
data=pandas.read_csv(csvf).values
labels_dict = dict(data)
idz=list(labels_dict.keys())
clazz=list(labels_dict.values())
return labels_dict,idz,clazz
我这样做是因为有一个约束,我将在接下来使用 DataLoader 加载数据时提到。
def class_hashmap(class_arr):
uniq_clazz = Counter(class_arr)
class_dict = {}
for i, j in enumerate(uniq_clazz):
class_dict[j] = i
return class_dict
labels, ids, class_names = read_csv(LABELS_LOCATION)
train_images = os.listdir(TRAIN_LOCATION)
class_numbers = class_hashmap(class_names)
接下来,我使用
opencv
将图像大小调整为 60,60,并将结果存储为 numpy 数组。
resize = []
indexed_labels = []
for t_i in train_images:
# resize.append(transform.resize(io.imread(TRAIN_LOCATION+t_i), (60, 60, 3))) # (60,60) is the height and widht; 3 is the number of channels
resize.append(cv2.resize(cv2.imread(TRAIN_LOCATION+t_i), (60, 60)).reshape(3, 60, 60))
indexed_labels.append(class_numbers[labels[t_i.split('.')[0]]])
resize = np.asarray(resize)
print(resize.shape)
在indexed_labels中,我给每个标签一个数字。
接下来,我将数据分成7:1:2部分
X = resize # numpy array of images [training data]
y = np.array(indexed_labels) # indexed labels for images [training labels]
sss = StratifiedShuffleSplit(n_splits=3, test_size=0.2, random_state=0)
sss.get_n_splits(X, y)
for train_index, test_index in sss.split(X, y):
X_temp, X_test = X[train_index], X[test_index] # split train into train and test [data]
y_temp, y_test = y[train_index], y[test_index] # labels
sss = StratifiedShuffleSplit(n_splits=3, test_size=0.123, random_state=0)
sss.get_n_splits(X_temp, y_temp)
for train_index, test_index in sss.split(X_temp, y_temp):
print("TRAIN:", train_index, "VAL:", test_index)
X_train, X_val = X[train_index], X[test_index] # training and validation data
y_train, y_val = y[train_index], y[test_index] # training and validation labels
接下来,我将上一步中的数据加载到torch DataLoaders中
batch_size = 500
learning_rate = 0.001
train = torch.utils.data.TensorDataset(torch.from_numpy(X_train), torch.from_numpy(y_train))
train_loader = torch.utils.data.DataLoader(train, batch_size=batch_size, shuffle=False)
val = torch.utils.data.TensorDataset(torch.from_numpy(X_val), torch.from_numpy(y_val))
val_loader = torch.utils.data.DataLoader(val, batch_size=batch_size, shuffle=False)
test = torch.utils.data.TensorDataset(torch.from_numpy(X_test), torch.from_numpy(y_test))
test_loader = torch.utils.data.DataLoader(test, batch_size=batch_size, shuffle=False)
# print(train_loader.size)
dataloaders = {
'train': train_loader,
'val': val_loader
}
接下来,我加载预训练的 rensnet 模型。
model_ft = models.resnet18(pretrained=True)
# freeze all model parameters
# for param in model_ft.parameters():
# param.requires_grad = False
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, len(class_numbers))
if use_gpu:
model_ft = model_ft.cuda()
model_ft.fc = model_ft.fc.cuda()
criterion = nn.CrossEntropyLoss()
# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(model_ft.fc.parameters(), lr=0.001, momentum=0.9)
# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)
model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler,
num_epochs=25)
然后我使用 train_model,这是 PyTorch 文档中here描述的方法。
但是,当我运行此程序时,出现错误。
Traceback (most recent call last):
File "/Users/nirvair/Sites/pyTorch/TL.py",
line 244, in <module>
num_epochs=25)
File "/Users/nirvair/Sites/pyTorch/TL.py", line 176, in train_model
outputs = model(inputs)
File "/Library/Python/2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
result = self.forward(*input, **kwargs)
File "/Library/Python/2.7/site-packages/torchvision/models/resnet.py", line 149, in forward
x = self.avgpool(x)
File "/Library/Python/2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
result = self.forward(*input, **kwargs)
File "/Library/Python/2.7/site-packages/torch/nn/modules/pooling.py", line 505, in forward
self.padding, self.ceil_mode, self.count_include_pad)
File "/Library/Python/2.7/site-packages/torch/nn/functional.py", line 264, in avg_pool2d
ceil_mode, count_include_pad)
File "/Library/Python/2.7/site-packages/torch/nn/_functions/thnn/pooling.py", line 360, in forward
ctx.ceil_mode, ctx.count_include_pad)
RuntimeError: Given input size: (512x2x2). Calculated output size: (512x0x0). Output size is too small at /Users/soumith/code/builder/wheel/pytorch-src/torch/lib/THNN/generic/SpatialAveragePooling.c:64
我似乎不明白这里出了什么问题。
您的网络对于您正在使用的图像大小 (60x60) 来说太深了。如您所知,随着输入图像在各层中传播,CNN 层确实会产生越来越小的特征图。这是因为您没有使用填充。
您遇到的错误只是表明下一层需要 512 个大小为 2 像素 x 2 像素的特征图。前向传递生成的实际特征图是 512 个大小为 0x0 的图。这种不匹配是触发错误的原因。
通常,所有库存网络,例如RESNET-18、Inception等,都要求输入图像的大小为224x224(至少)。您可以使用
torchvision transforms
[1] 更轻松地完成此操作。您还可以使用更大的图像尺寸,但 AlexNet 除外,它具有硬编码的特征向量大小,如我在 [2] 中的回答所述。
额外提示:如果您在预训练模式下使用网络,则需要使用 [3] 中 pytorch 文档中的参数来美化数据。
链接
我只是添加@Mo Hossny 的答案,输入形状不需要是
224x224(至少)。
实际上,它至少应该是33x33。 但 224x224 是推荐的形状,因为网络最初是在此类输入形状上进行训练的。 @Mo Hossny 所说的正确的是 CNN 网络(如 ResNet18)的降维性质。然而,由于 AdaptiveAvgPool2d 层(“自适应”术语允许传递比推荐形状更大和更小的图像),在该层之前获得的特征图理论上如下:
Input shape feature maps(input to AdaptiveAvgPool2d)
3x32x32 --> batch_size x 512 x (cannot reduce dimensionality more)
3x33x33 --> batch_size x 512 x 2x2
3x64x64 --> batch_size x 512 x 2x2
3x128x128 --> batch_size x 512 x 4x4
3x224x224 --> batch_size x 512 x 7x7
3x256x256 --> batch_size x 512 x 8x8
以resnet18为模型进行实验:
model(torch.rand(1, 3, 33, 33))
有效,其中
model(torch.rand(1, 3, 32, 32))
结果为 ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 512, 1, 1])
因此,在这种情况下,您的尺寸不应该成为问题。