我正在尝试做一个多层感知器,我怀疑我在向后传递时犯了一个错误。下面是我目前正在做的事情。
def backward(self, dJdy_hat):
# Compute the gradient of the loss with respect to the output of the second linear layer (s2)
if self.g_function == "relu":
dJds2 = torch.mm(dJdy_hat, (self.cache['z2'] > 0).float())
elif self.g_function == "identity":
dJds2 = dJdy_hat
# Compute the gradient of the loss with respect to the output of the first linear layer (s1)
if self.f_function == "relu":
dJdz1 = dJds2* self.parameters['W2'] * (self.cache['z1'] > 0).float()
elif self.f_function == "identity":
dJdz1 = torch.mm(dJds2, self.parameters['W2'])
# Compute the gradients of the loss with respect to the parameters
self.grads['dJdW2'] = torch.mm(dJds2.T, self.cache['z1'])
self.grads['dJdb2'] = torch.sum(dJds2, dim=0)
self.grads['dJdW1'] = torch.mm(dJdz1.T, self.cache['x'])
self.grads['dJdb1'] = torch.sum(dJdz1, dim=0)
Z1 和 Z2 是应用激活函数后前一个神经元的输出,W1,W2 是权重。当我将我的结果与 autograd 进行比较时,我可以看到它们给出了相同的结果。