我正在使用 Hugging Face 中的 ResNet18 来微调多标签数据集。我希望能够预测图像的 3 个相应标签,为此我创建了 3 个完全连接的层。首先,我尝试更新ResNet18的分类器层:
model2.classifier_artist = torch.nn.Sequential(
torch.nn.Dropout(p=0.2, inplace=True),
torch.nn.Linear(in_features=512, out_features=num_classes_artist, bias=True)
).to(device)
model2.classifier_style = torch.nn.Sequential(
torch.nn.Dropout(p=0.2, inplace=True),
torch.nn.Linear(in_features=512, out_features=num_classes_style, bias=True)
).to(device)
model2.classifier_genre = torch.nn.Sequential(
torch.nn.Dropout(p=0.2, inplace=True),
torch.nn.Linear(in_features=512, out_features=num_classes_genre, bias=True)
).to(device)
num_classes_artist = 129
num_classes_style = 27
num_classes_genre = 11
但这没有用。模型架构不包括我添加的 3 个分类器。以下是 torchinfo 的摘要:
ResNetForImageClassification (ResNetForImageClassification) [32, 3, 224, 224] [32, 1000]
├─ResNetModel (resnet) [32, 3, 224, 224] [32, 512, 1, 1]
│ └─ResNetEmbeddings (embedder) [32, 3, 224, 224] [32, 64, 56, 56]
│ │ └─ResNetConvLayer (embedder) [32, 3, 224, 224] [32, 64, 112, 112]
│ │ └─MaxPool2d (pooler) [32, 64, 112, 112] [32, 64, 56, 56]
│ └─ResNetEncoder (encoder) [32, 64, 56, 56] [32, 512, 7, 7]
│ │ └─ModuleList (stages) -- --
│ └─AdaptiveAvgPool2d (pooler) [32, 512, 7, 7] [32, 512, 1, 1]
├─Sequential (classifier) [32, 512, 1, 1] [32, 1000]
│ └─Flatten (0) [32, 512, 1, 1] [32, 512]
│ └─Linear (1) [32, 512] [32, 1000]
之后我开始在 PyTorch 中实现它:
class WikiartModel(nn.Module):
def __init__(self, num_artists, num_genres, num_styles):
super(WikiartModel, self).__init__()
# Shared Convolutional Layers
self.conv1 = nn.Conv2d(3, 64, kernel_size=3, padding =1)
self.conv2 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
self.conv3 = nn.Conv2d(128, 256, kernel_size=3, padding=1)
self.pool = nn.MaxPool2d(2, 2)
# Artist classification branch
self.fc_artist1 = nn.Linear(256 * 16 * 16, 512)
self.fc_artist2 = nn.Linear(512, num_artists)
# Genre classification branch
self.fc_genre1 = nn.Linear(256 * 16 * 16, 512)
self.fc_genre2 = nn.Linear(512, num_genres)
# Style classification branch
self.fc_style1 = nn.Linear(256 * 16 * 16, 512)
self.fc_style2 = nn.Linear(512, num_styles)
def forward(self, x):
# Shared convolutional layers
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = self.pool(F.relu(self.conv3(x)))
x = x.view(-1, 256 * 16 * 16)
# Artist classification branch
artists_out = F.relu(self.fc_artist1(x))
artists_out = self.fc_artist2(artists_out)
# Genre classification branch
genre_out = F.relu(self.fc_genre1(x))
genre_out = self.fc_genre2(genre_out)
# Style classification branch
style_out = F.relu(self.fc_style1(x))
style_out = self.fc_style2(style_out)
return artists_out, genre_out, style_out
# Set the number of classes for each task
num_artists = 129 # Including "Unknown Artist"
num_genres = 11 # Including "Unknown Genre"
num_styles = 27
这是 torchinfo 摘要:
Layer (type (var_name)) Input Shape Output Shape
================================================================================
WikiartModel (WikiartModel) [32, 3, 224, 224] [98, 129]
├─Conv2d (conv1) [32, 3, 224, 224] [32, 64, 224, 224]
├─MaxPool2d (pool) [32, 64, 224, 224] [32, 64, 112, 112]
├─Conv2d (conv2) [32, 64, 112, 112] [32, 128, 112, 112]
├─MaxPool2d (pool) [32, 128, 112, 112] [32, 128, 56, 56]
├─Conv2d (conv3) [32, 128, 56, 56] [32, 256, 56, 56]
├─MaxPool2d (pool) [32, 256, 56, 56] [32, 256, 28, 28]
├─Linear (fc_artist1) [98, 65536] [98, 512]
├─Linear (fc_artist2) [98, 512] [98, 129]
├─Linear (fc_genre1) [98, 65536] [98, 512]
├─Linear (fc_genre2) [98, 512] [98, 11]
├─Linear (fc_style1) [98, 65536] [98, 512]
├─Linear (fc_style2) [98, 512] [98, 27]
输入数据的批量大小
([32, 3, 224, 224])
和模型输出预测的批量大小([98, 129])
似乎不同。我已经检查了我的数据加载、模型架构和训练循环,但我似乎无法确定这个问题的根本原因。这种不一致会导致计算训练循环内的损失时出现错误:
loss_artist = criterion_artist(outputs_artist, labels_artist)
:
ValueError: Expected input batch_size (98) to match target batch_size (32).
在您的模型架构中,您定义了 3 种类型的 2D 卷积层 (
self.conv1
、self.conv2
、self.conv3
) 和一个最大池化层 (self.pool
)。
对于输入张量大小
[32, 3, 224, 224]
,经过指定层后:
self.conv1 = nn.Conv2d(3, 64, kernel_size=3, padding =1)
self.conv2 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
self.conv3 = nn.Conv2d(128, 256, kernel_size=3, padding=1)
self.pool = nn.MaxPool2d(2, 2)
基于您的输入张量大小 -
[32, 3, 224, 224]
,图像通过后
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = self.pool(F.relu(self.conv3(x)))
,
x
的大小将变为[32, 256, 28, 28]
。
现在,您想在后续模块中使用全连接网络。为此,您应该替换该行:
x = x.view(-1, 256 * 16 * 16)
与
x = x.view(x.size(0),-1)
此修改将张量展平为形状
[32, 200704]
(256 * 28 * 28 = 200704)。因此,您应该调整:
nn.Linear(256 * 16 * 16, 512)
到
nn.Linear(256 * 28 * 28, 512)
修改后的
WikiartModel
类如下:
class WikiartModel(nn.Module):
def __init__(self, num_artists, num_genres, num_styles):
super(WikiartModel, self).__init__()
# Shared Convolutional Layers
self.conv1 = nn.Conv2d(3, 64, kernel_size=3, padding =1)
self.conv2 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
self.conv3 = nn.Conv2d(128, 256, kernel_size=3, padding=1)
self.pool = nn.MaxPool2d(2, 2)
self.size = 28
# Artist classification branch
self.fc_artist1 = nn.Linear(256 * self.size * self.size, 512)
self.fc_artist2 = nn.Linear(512, num_artists)
# Genre classification branch
self.fc_genre1 = nn.Linear(256 * self.size * self.size, 512)
self.fc_genre2 = nn.Linear(512, num_genres)
# Style classification branch
self.fc_style1 = nn.Linear(256 * self.size * self.size, 512)
self.fc_style2 = nn.Linear(512, num_styles)
def forward(self, x):
# Shared convolutional layers
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = self.pool(F.relu(self.conv3(x)))
x = x.view(x.size(0),-1)
# Artist classification branch
artists_out = F.relu(self.fc_artist1(x))
artists_out = self.fc_artist2(artists_out)
# Genre classification branch
genre_out = F.relu(self.fc_genre1(x))
genre_out = self.fc_genre2(genre_out)
# Style classification branch
style_out = F.relu(self.fc_style1(x))
style_out = self.fc_style2(style_out)
return artists_out, genre_out, style_out
# Set the number of classes for each task
num_artists = 129 # Including "Unknown Artist"
num_genres = 11 # Including "Unknown Genre"
num_styles = 27