为什么'loss.backward()'和'weight.grad'返回一个包含所有零的张量?

问题描述 投票:0回答:1

当我运行'loss.backward()'和'weight.grad'时,我得到一个包含所有零的张量。另外,'weight.grad_fn'重试NONE。

然而,对于第二层'w2'来说,似乎都能返回正确的结果。如果我进行简单的操作,如x*2或x**2'backward()'和'.grad'返回正确的结果

这是我的代码。

import torch
from torch import nn
import torch.nn.functional as F
from torchvision import datasets, transforms

# Getting MNIST data
num_workers = 0
batch_size = 64
transform = transforms.ToTensor()
train_data = datasets.MNIST(root='data', train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, num_workers=num_workers)
dataiter = iter(train_loader)
images, labels = dataiter.next()

#####################################
#####################################
#### NN Part

def activation(x):
    return 1/(1+torch.exp(-x))

inputs = torch.from_numpy(images.view())
# Flatten the inputs format from (64,1,28,28) into (64,784)
inputs = inputs.reshape(images.shape[0], int(images.shape[1]*images.shape[2]*images.shape[3]))


w1 = torch.randn(784, 256, requires_grad=True)# n_input, n_hidden
b1 = torch.randn(256)# n_hidden

w2 = torch.randn(256, 10, requires_grad=True)# n_hidden, n_output
b2 = torch.randn(10)# n_output

h = activation(torch.mm(inputs, w1) + b1)
y = torch.mm(h, w2) + b2

#print(h)
#print(y)

y.sum().backward()
print(w1.grad)
print(w1.grad_fn)
#print(w2.grad)
#print(w2.grad_fn)

顺便说一下,如果我试着用这种方式运行,也会出现同样的问题。

images = images.reshape(images.shape[0], -1)

model = nn.Sequential(nn.Linear(784, 128),
                      nn.ReLU(),
                      nn.Linear(128, 64),
                      nn.ReLU(),
                      nn.Linear(64, 10),
                      nn.LogSoftmax(dim=1))

logits = model(images)
criterion = nn.NLLLoss()

loss = criterion(logits, labels)
print(loss)
print(loss.grad_fn)


print('Before backward pass: ', model[0].weight.grad)
loss.backward()
print('After: ', model[0].weight.grad)
#print('After: ', model[2].weight.grad)
#print('After: ', model[4].weight.grad)
python-3.x deep-learning neural-network pytorch torch
1个回答
0
投票

的梯度 w1 并非都是零,只是有很多零,特别是在边界附近,因为MNIST图像有很多黑色像素(零)。当与零相乘时,产生的梯度也是零。

通过打印 w1.grad 你只能看到非常小的一部分值(边框),你就是看不到非零值。

w1.grad
# => tensor([[0., 0., 0.,  ..., 0., 0., 0.],
#            [0., 0., 0.,  ..., 0., 0., 0.],
#            [0., 0., 0.,  ..., 0., 0., 0.],
#            ...,
#            [0., 0., 0.,  ..., 0., 0., 0.],
#            [0., 0., 0.,  ..., 0., 0., 0.],
#            [0., 0., 0.,  ..., 0., 0., 0.]])

# Indices of non-zero elements
w1.grad.nonzero()
# => tensor([[ 71,   0],
#            [ 71,   1],
#            [ 71,   2],
#            ...,
#            [746, 253],
#            [746, 254],
#            [746, 255]])
© www.soinside.com 2019 - 2024. All rights reserved.