当我运行'loss.backward()'和'weight.grad'时,我得到一个包含所有零的张量。另外,'weight.grad_fn'重试NONE。
然而,对于第二层'w2'来说,似乎都能返回正确的结果。如果我进行简单的操作,如x*2或x**2'backward()'和'.grad'返回正确的结果
这是我的代码。
import torch
from torch import nn
import torch.nn.functional as F
from torchvision import datasets, transforms
# Getting MNIST data
num_workers = 0
batch_size = 64
transform = transforms.ToTensor()
train_data = datasets.MNIST(root='data', train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, num_workers=num_workers)
dataiter = iter(train_loader)
images, labels = dataiter.next()
#####################################
#####################################
#### NN Part
def activation(x):
return 1/(1+torch.exp(-x))
inputs = torch.from_numpy(images.view())
# Flatten the inputs format from (64,1,28,28) into (64,784)
inputs = inputs.reshape(images.shape[0], int(images.shape[1]*images.shape[2]*images.shape[3]))
w1 = torch.randn(784, 256, requires_grad=True)# n_input, n_hidden
b1 = torch.randn(256)# n_hidden
w2 = torch.randn(256, 10, requires_grad=True)# n_hidden, n_output
b2 = torch.randn(10)# n_output
h = activation(torch.mm(inputs, w1) + b1)
y = torch.mm(h, w2) + b2
#print(h)
#print(y)
y.sum().backward()
print(w1.grad)
print(w1.grad_fn)
#print(w2.grad)
#print(w2.grad_fn)
顺便说一下,如果我试着用这种方式运行,也会出现同样的问题。
images = images.reshape(images.shape[0], -1)
model = nn.Sequential(nn.Linear(784, 128),
nn.ReLU(),
nn.Linear(128, 64),
nn.ReLU(),
nn.Linear(64, 10),
nn.LogSoftmax(dim=1))
logits = model(images)
criterion = nn.NLLLoss()
loss = criterion(logits, labels)
print(loss)
print(loss.grad_fn)
print('Before backward pass: ', model[0].weight.grad)
loss.backward()
print('After: ', model[0].weight.grad)
#print('After: ', model[2].weight.grad)
#print('After: ', model[4].weight.grad)
的梯度 w1
并非都是零,只是有很多零,特别是在边界附近,因为MNIST图像有很多黑色像素(零)。当与零相乘时,产生的梯度也是零。
通过打印 w1.grad
你只能看到非常小的一部分值(边框),你就是看不到非零值。
w1.grad
# => tensor([[0., 0., 0., ..., 0., 0., 0.],
# [0., 0., 0., ..., 0., 0., 0.],
# [0., 0., 0., ..., 0., 0., 0.],
# ...,
# [0., 0., 0., ..., 0., 0., 0.],
# [0., 0., 0., ..., 0., 0., 0.],
# [0., 0., 0., ..., 0., 0., 0.]])
# Indices of non-zero elements
w1.grad.nonzero()
# => tensor([[ 71, 0],
# [ 71, 1],
# [ 71, 2],
# ...,
# [746, 253],
# [746, 254],
# [746, 255]])