我正在尝试仅使用 NumPY 库创建一个分类神经网络。我已经完全创建了这个网络并研究了它的逻辑,对我来说它看起来非常好。我不知道是什么导致它没有达到最佳参数值。我注意到的一个重要的事情是第一层中的权重没有任何变化。
什么可能导致代码无法按预期工作?
import numpy as np
from tensorflow.keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(x_train.shape[0], -1)
x_test = x_test.reshape(x_test.shape[0], -1)
x_train, x_test = x_train/255, x_test/255
print(x_train.shape)
from sklearn.preprocessing import OneHotEncoder
ohe = OneHotEncoder(sparse_output=False)
ohe.fit(y_train.reshape(-1,1))
y_train = ohe.fit_transform(y_train.reshape(-1,1))
y_test = ohe.transform(y_test.reshape(-1,1))
print(y_train.shape)
def linear(x,deriv=False):
return x
def relu(x,deriv=False):
if deriv:
return (x > 0).astype(float)
return np.maximum(0,x)
def softmax(x,deriv=False):
xre = x - x.max(axis=0,keepdims=True)
xexp = np.exp(xre)
a = xexp.sum(axis=0,keepdims=True)
prob = xexp/a
return prob
def sigmoid(x,deriv=False):
a = 1/(1+np.exp(-x))
if deriv:
return a * (1 - a)
return a
activations = {'linear':linear,'relu':relu,'sigmoid':sigmoid,'softmax':softmax}
def initialvals(cols=784):
shape = [cols,10,10]
w = dict()
b = dict()
for i in range(len(shape)-1):
w[i+1] = np.random.uniform(-0.5,0.5,(shape[i+1],shape[i]))
for i in range(len(shape)-1):
b[i+1] = np.zeros((shape[i+1],1))
return w,b
def allprints(ww,bb):
print('Weights')
for i in ww:
print(ww[i].shape)
print('Biases')
for i in bb:
print(bb[i].shape)
print('Weights')
for i in ww:
print(i)
print(ww[i])
print()
print('Biases')
for i in bb:
print(i)
print(bb[i])
print()
def forprop(inputs,weight,bias,acts,av):
z = dict()
a = dict()
z[0] = inputs.T
a[0] = acts[av[0]](z[0])
for i in range(1,len(weight)+1):
z[i] = np.dot(weight[i],a[i-1]) + bias[i]
a[i] = acts[av[i]](z[i])
return z,a
def backprop(inputs, output, weight, bias, acts, av, size=50, iters=20, lr=0.01):
n_samples = inputs.shape[0]
global z_in
for k in range(iters):
shuff = np.random.permutation(n_samples)
inputs = inputs[shuff]
output = output[shuff]
for i in range(0, n_samples, size):
batch_inputs = inputs[i:i + size]
batch_output = output[i:i + size]
z, a = forprop(batch_inputs, weight, bias, acts, av)
z_in = z
er = dict()
er[len(bias)] = a[len(bias)] - batch_output.T
for j in range(len(bias)-1, 0, -1):
er[j] = np.dot(weight[j + 1].T, er[j + 1]) * acts[av[j]](z[j], deriv=True)
# delta_h = np.transpose(w_h_o) @ delta_o * (h * (1 - h))
for j in range(1, len(bias) + 1):
bias[j] -= lr * er[j].mean(axis=1, keepdims=True)
weight[j] -= lr * (np.dot(er[j], a[j - 1].T) / batch_inputs.shape[0])
return weight, bias, er ,z_in
we,be = initialvals()
allprints(we,be)
w_calc, b_calc, er, z= backprop(x_train,y_train,we,be,activations,['linear','sigmoid','softmax'])
allprints(w_calc,b_calc)
我已经检查了错误值以及操作的形状,这似乎不是问题。
我尝试了不同的学习率以及不同的批量大小,我什至尝试使用张量流库进行相同的设置,它给出了良好的预测,因此模型结构不是问题。
我还尝试以不同的方式初始化我的参数,例如 random.randn、zeros 等。
我花了一段时间才意识到,有时事情并不像看上去的那样。
实际上,第一层的权重在每次迭代时都会发生变化!
为了说服自己,请在 allprints 函数中添加这一行:
print(np.sum(abs(ww[i])))
由于 W1 有 784 列,因此打印内容被截断,仅显示第一行和最后一行/列。但 dW1 = er[1].X / n 其中 X 是训练数据。数据是黑色背景上的灰色数字图像,并且由于数字或多或少居中,因此在展平后,批次中每个图像的所有第一个和最后几十个数字都是 0(黑色)。当这样的矩阵与另一个矩阵相乘时,我们在行的开头和结尾处得到 0。作为 W1 <- W1 - lambda * dW1 the numbers we see on the truncated print don't change ... but many of the numbers not printed change.