未达到我的神经网络的最佳参数值

Question

我正在尝试仅使用 NumPY 库创建一个分类神经网络。我已经完全创建了这个网络并研究了它的逻辑，对我来说它看起来非常好。我不知道是什么导致它没有达到最佳参数值。我注意到的一个重要的事情是第一层中的权重没有任何变化。

什么可能导致代码无法按预期工作？

import numpy as np
from tensorflow.keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(x_train.shape[0], -1)
x_test = x_test.reshape(x_test.shape[0], -1)
x_train, x_test = x_train/255, x_test/255
print(x_train.shape)

from sklearn.preprocessing import OneHotEncoder
ohe = OneHotEncoder(sparse_output=False)
ohe.fit(y_train.reshape(-1,1))
y_train = ohe.fit_transform(y_train.reshape(-1,1))
y_test = ohe.transform(y_test.reshape(-1,1))
print(y_train.shape)

def linear(x,deriv=False):
    return x

def relu(x,deriv=False):
    if deriv:
        return (x > 0).astype(float)
    return np.maximum(0,x)

def softmax(x,deriv=False):
    xre = x - x.max(axis=0,keepdims=True)
    xexp = np.exp(xre)
    a = xexp.sum(axis=0,keepdims=True)
    prob = xexp/a
    return prob

def sigmoid(x,deriv=False):
    a = 1/(1+np.exp(-x))
    if deriv:
        return a * (1 - a)
    return a

activations = {'linear':linear,'relu':relu,'sigmoid':sigmoid,'softmax':softmax}

def initialvals(cols=784):
    shape = [cols,10,10]
    w = dict()
    b = dict()
    for i in range(len(shape)-1):
        w[i+1] = np.random.uniform(-0.5,0.5,(shape[i+1],shape[i]))

    for i in range(len(shape)-1):
        b[i+1] = np.zeros((shape[i+1],1))
    return w,b

def allprints(ww,bb):
    print('Weights')
    for i in ww:
        print(ww[i].shape)
    print('Biases')
    for i in bb:
        print(bb[i].shape)

    print('Weights')
    for i in ww:
        print(i)
        print(ww[i])
        print()
    print('Biases')
    for i in bb:
        print(i)
        print(bb[i])
        print()

def forprop(inputs,weight,bias,acts,av):
    z = dict()
    a = dict()
    
    z[0] = inputs.T
    a[0] = acts[av[0]](z[0])
    
    for i in range(1,len(weight)+1):
        z[i] = np.dot(weight[i],a[i-1]) + bias[i]
        a[i] = acts[av[i]](z[i])
    return z,a

def backprop(inputs, output, weight, bias, acts, av, size=50, iters=20, lr=0.01):
    n_samples = inputs.shape[0]
    global z_in
    for k in range(iters):
        shuff = np.random.permutation(n_samples)
        inputs = inputs[shuff]
        output = output[shuff]
        
        for i in range(0, n_samples, size): 
            batch_inputs = inputs[i:i + size]
            batch_output = output[i:i + size]
            z, a = forprop(batch_inputs, weight, bias, acts, av)
            z_in = z
            er = dict()
            er[len(bias)] = a[len(bias)] - batch_output.T 
            for j in range(len(bias)-1, 0, -1):
                er[j] = np.dot(weight[j + 1].T, er[j + 1]) * acts[av[j]](z[j], deriv=True)
                # delta_h = np.transpose(w_h_o) @ delta_o * (h * (1 - h))

            for j in range(1, len(bias) + 1):
                bias[j] -= lr * er[j].mean(axis=1, keepdims=True)
                weight[j] -= lr * (np.dot(er[j], a[j - 1].T) / batch_inputs.shape[0])
    return weight, bias, er ,z_in

we,be = initialvals()
allprints(we,be)
w_calc, b_calc, er, z= backprop(x_train,y_train,we,be,activations,['linear','sigmoid','softmax'])
allprints(w_calc,b_calc)

我已经检查了错误值以及操作的形状，这似乎不是问题。

我尝试了不同的学习率以及不同的批量大小，我什至尝试使用张量流库进行相同的设置，它给出了良好的预测，因此模型结构不是问题。

我还尝试以不同的方式初始化我的参数，例如 random.randn、zeros 等。

Answer 1

我花了一段时间才意识到，有时事情并不像看上去的那样。
实际上，第一层的权重在每次迭代时都会发生变化！

为了说服自己，请在 allprints 函数中添加这一行：

print(np.sum(abs(ww[i])))

由于 W1 有 784 列，因此打印内容被截断，仅显示第一行和最后一行/列。但 dW1 = er[1].X / n 其中 X 是训练数据。数据是黑色背景上的灰色数字图像，并且由于数字或多或少居中，因此在展平后，批次中每个图像的所有第一个和最后几十个数字都是 0（黑色）。当这样的矩阵与另一个矩阵相乘时，我们在行的开头和结尾处得到 0。作为 W1 <- W1 - lambda * dW1 the numbers we see on the truncated print don't change ... but many of the numbers not printed change.

未达到我的神经网络的最佳参数值

问题描述投票：0回答：1

1个回答

最新问题

未达到我的神经网络的最佳参数值

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1