我必须使用
MSELoss
函数来定义分类问题的损失。因此它一直显示有关张量形状的错误消息。
错误信息:
torch.Size([32, 10]) torch.Size([32])
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last) <ipython-input-56-483bc54943a4> in <module>
53 output = model.forward(images)
54 print(output.shape, labels.shape)
---> 55 loss = criterion(output, labels)
56 loss.backward()
57 optimizer.step()
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in
__call__(self, *input, **kwargs)
530 result = self._slow_forward(*input, **kwargs)
531 else:
--> 532 result = self.forward(*input, **kwargs)
533 for hook in self._forward_hooks.values():
534 hook_result = hook(self, input, result)
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/loss.py in forward(self, input, target)
429
430 def forward(self, input, target):
--> 431 return F.mse_loss(input, target, reduction=self.reduction)
432
433
/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py in mse_loss(input, target, size_average, reduce, reduction) 2213 ret = torch.mean(ret) if reduction == 'mean' else torch.sum(ret) 2214 else:
-> 2215 expanded_input, expanded_target = torch.broadcast_tensors(input, target) 2216 ret = torch._C._nn.mse_loss(expanded_input, expanded_target,
_Reduction.get_enum(reduction)) 2217 return ret
/opt/conda/lib/python3.7/site-packages/torch/functional.py in broadcast_tensors(*tensors)
50 [0, 1, 2]])
51 """
---> 52 return torch._C._VariableFunctions.broadcast_tensors(tensors)
53
54
RuntimeError: The size of tensor a (10) must match the size of tensor b (32) at non-singleton dimension 1
如何重塑张量,以及应该更改哪个张量(输出或标签)来计算损失?
完整代码附在下面。
import numpy as np
import torch
# Loading the Fashion-MNIST dataset
from torchvision import datasets, transforms
# Get GPU Device
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))])
# Download and load the training data
trainset = datasets.FashionMNIST('MNIST_data/', download = True, train = True, transform = transform)
testset = datasets.FashionMNIST('MNIST_data/', download = True, train = False, transform = transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size = 32, shuffle = True, num_workers=4)
testloader = torch.utils.data.DataLoader(testset, batch_size = 32, shuffle = True, num_workers=4)
# Examine a sample
dataiter = iter(trainloader)
images, labels = dataiter.next()
# Define the network architecture
from torch import nn, optim
import torch.nn.functional as F
model = nn.Sequential(nn.Linear(784, 128),
nn.ReLU(),
nn.Linear(128, 10),
nn.LogSoftmax(dim = 1))
model.to(device)
# Define the loss
criterion = nn.MSELoss()
# Define the optimizer
optimizer = optim.Adam(model.parameters(), lr = 0.001)
# Define the epochs
epochs = 5
train_losses, test_losses = [], []
for e in range(epochs):
running_loss = 0
for images, labels in trainloader:
# Flatten Fashion-MNIST images into a 784 long vector
images = images.to(device)
labels = labels.to(device)
images = images.view(images.shape[0], -1)
# Training pass
optimizer.zero_grad()
output = model.forward(images)
print(output.shape, labels.shape)
loss = criterion(output, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
else:
test_loss = 0
accuracy = 0
# Turn off gradients for validation, saves memory and computation
with torch.no_grad():
# Set the model to evaluation mode
model.eval()
# Validation pass
for images, labels in testloader:
images = images.to(device)
labels = labels.to(device)
images = images.view(images.shape[0], -1)
ps = model(images)
test_loss += criterion(ps, labels)
top_p, top_class = ps.topk(1, dim = 1)
equals = top_class == labels.view(*top_class.shape)
accuracy += torch.mean(equals.type(torch.FloatTensor))
model.train()
print("Epoch: {}/{}..".format(e+1, epochs),
"Training loss: {:.3f}..".format(running_loss/len(trainloader)),
"Test loss: {:.3f}..".format(test_loss/len(testloader)),
"Test Accuracy: {:.3f}".format(accuracy/len(testloader)))
从错误之前打印的输出来看,
torch.Size([32, 10]) torch.Size([32])
。
左边的是模型给你的,右边的来自
trainloader
,通常你用它来做类似nn.CrossEntropyLoss
的事情。
并且从完整的错误日志来看,错误来自这一行
loss = criterion(output, labels)
实现这个功能的方法叫做One-hot Encoding,如果是我的话我会这样写。
ones = torch.sparse.torch.eye(10).to(device) # number of class class
labels = ones.index_select(0, labels)
或者,您可以将损失函数从
nn.MSELoss()
更改为 nn.CrossEntropyLoss()
。对于此类分类任务,交叉熵损失通常比 MSE 更可取,并且在 PyTorch 的实现中,此损失函数负责大量底层形状转换,因此您可以为其提供类概率向量和单个类标签。
从根本上讲,您的模型尝试通过计算每个可能类别的分数(您可以将其称为“置信度分数”)来预测输入属于哪个类别。因此,如果你有 10 个类,模型的输出将是一个 10 维列表(在 PyTorch 中,张量形状
[10]
),并且预测将是最高分数的索引。通常,人们会应用 softmax (https://en.wikipedia.org/wiki/Softmax_function) 函数将这些分数转换为概率分布,因此所有分数将在 0 到 1 之间,并且元素总和为 1。
交叉熵是此任务损失函数的常见选择:它将预测列表与 one-hot 编码标签进行比较。例如。如果您有 3 个类别,则标签将类似于
[1, 0, 0]
来表示第一个类别。这也称为“one-hot 编码”。同时,预测可能看起来像[0.7, 0.1, 0.2]
。在 PyTorch 中,nn.CrossEntropyLoss()
期望您的标签作为单值张量出现,其值代表类标签,因为实际上不需要在内存中移动长而稀疏的向量。所以这个损失函数完成了你想要做的比较,我猜它的实现比实际创建 one-hot 编码更有效。