Python 中的 Adaline 简单层神经网络类和 ValueError：操作数无法与形状一起广播

Question

我对（监督式）机器学习并不陌生，但我是 PYTHON 的初学者，尤其是 numpy。在一本电子书之后，我正在尝试为分类任务定义 Adaline NN 的工作示例开始完整的工作流程，我导入了一个数据框，选择了特征和目标，在训练/测试数据框中拆分并定义了一个列转换策略

df = pd.read_excel("...DOCS_R-ExpFullImpTobaccoAdd_19Feb23.xlsx")

print(f"{y_}Data types of data columns: \n{m_}{df.dtypes}")

df.sample(8, random_state=6).T

df.isnull().sum()

# Define Vectors
num_cols = [
                "age",
                "tm_hospstay",
                "vit_bmi",
                "vit_temp",
                "rf_predmort",
                "rf_preddeep",
                "rf_predreop",
                "rf_meldscr",
                "ech_lveddpre",
                "ech_lvefpre"
                            ]
binary_cols = [
                "sex",
                "hx_afib",
                "hx_arrhythmia",
                "hx_cabg",
                "hx_cad",
                "hx_cancer",
                "rf_tobacco",
                "rf_diab",
                "rf_dyslip",
                "rf_dialysis",
                "rf_htn"
                        ]

cat_cols = [
                "admreasnb",
                "rf_copd"
                            ]

ordin_cols = ["vit_nyha",
              "vit_angccs"] # Ordinal Variables to be encoded

target = df[['out_infx']]

# Recast Categoricals as Integers (Best way of analysis in PYTHON)
for var in cat_cols:
    df[var] = df[var].astype('Int64') # Convert from Category to Float (otherwise the merging does not work)

for var in binary_cols:
    df[var] = df[var].astype('Int64') # Convert from Category to Float (otherwise the merging does not work)

for var in ordin_cols:
    df[var] = df[var].astype('Int64') # Convert from Category to Float (otherwise the merging does not work)

# Recode Target (Any Infection) as 1 and -1 (in place of 1 and 0)
target =  np.where(target==0,-1,1).astype('int')

# Create a Vector of all predictors
features_cols = num_cols + binary_cols + cat_cols + ordin_cols

# Then a Dataframe
features = df[num_cols + binary_cols + cat_cols + ordin_cols]

# Finally create training and testing DataFrames
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.2, random_state=0, stratify=target)

# Setup Column Transformation
from sklearn.preprocessing import OneHotEncoder
ohe = OneHotEncoder(drop="first", sparse=False)

standtrans = make_pipeline(OutlierTrans(3),
  SimpleImputer(strategy="median"),
  StandardScaler())
ordintrans = make_pipeline(MakeOrdinal(),
  StandardScaler())
bintrans = make_pipeline(ohe)
cattrans = make_pipeline(ohe)

coltrans = ColumnTransformer(
  transformers=[
    ("stand", standtrans, num_cols),
    ("ord", ordintrans, ordin_cols),
    ("bin", bintrans, binary_cols),
    ("cat", cattrans, cat_cols),
  ]
)

然后我使用了我读过的 AdalineGD 课程：

class AdalineGD(object):
    """ADAptive LInear NEuron classifier.

    Parameters
    ------------
    eta : float
      Learning rate (between 0.0 and 1.0)
    n_iter : int
      Passes over the training dataset.
    random_state : int
      Random number generator seed for random weight
      initialization.


    Attributes
    -----------
    w_ : 1d-array
      Weights after fitting.
    cost_ : list
      Sum-of-squares cost function value in each epoch.

    """
    def __init__(self, eta=0.01, n_iter=50, random_state=1):
        self.eta = eta
        self.n_iter = n_iter
        self.random_state = random_state

    def fit(self, X, y):
        """ Fit training data.

        Parameters
        ----------
        X : {array-like}, shape = [n_examples, n_features]
          Training vectors, where n_examples is the number of examples and
          n_features is the number of features.
        y : array-like, shape = [n_examples]
          Target values.

        Returns
        -------
        self : object

        """
        rgen = np.random.RandomState(self.random_state)
        self.w_ = rgen.normal(loc=0.0, scale=0.01, size=1 + X.shape[1])
        self.cost_ = []

        for i in range(self.n_iter):
            net_input = self.net_input(X)
            # Please note that the "activation" method has no effect
            # in the code since it is simply an identity function. We
            # could write `output = self.net_input(X)` directly instead.
            # The purpose of the activation is more conceptual, i.e.,
            # in the case of logistic regression (as we will see later),
            # we could change it to
            # a sigmoid function to implement a logistic regression classifier.
            output = self.activation(net_input)
            errors = (y - output)
            self.w_[1:] += self.eta * X.T.dot(errors)
            self.w_[0] += self.eta * errors.sum()
            cost = (errors**2).sum() / 2.0
            self.cost_.append(cost)
        return self

    def net_input(self, X):
        """Calculate net input"""
        return np.dot(X, self.w_[1:]) + self.w_[0]

    def activation(self, X):
        """Compute linear activation"""
        return X

    def predict(self, X):
        """Return class label after unit step"""
        return np.where(self.activation(self.net_input(X)) >= 0.0, 1, -1)

最后尝试用管道将所有东西放在一起

ada = AdalineGD(eta=0.01, n_iter=15)

pipe1 = make_pipeline(coltrans, ada)

for i in tqdm(range(100),  ncols=100, desc="Computation Time"):
pipe1.fit(X_train, y_train)

不幸的是，当我尝试拟合 ada 模型时，出现以下错误

ValueError: operands could not be broadcast together with shapes (34,) (34,3711) (34,)

仅供参考， X_train.shape --> (3711, 25) y_train.shape --> (3711, 1)

所以我不明白34是从哪里来的最后，我还尝试将 X_train 二维数组转换为 numpy 数组使用

pipe1.fit(X_train.values, y_train)

但是我得到另一个错误

ValueError: Specifying the columns using strings is only supported for pandas DataFrames

所以我现在被困住了，任何帮助，最重要的是对正在发生的事情的合理解释将不胜感激。

这是 X_train 数据帧的示例

      age  tm_hospstay  vit_bmi  vit_temp  rf_predmort  rf_preddeep  rf_predreop  rf_meldscr  ech_lveddpre  ech_lvefpre  sex  hx_afib  \
20     83          121    26.37     35.00         0.01         0.01         0.06        8.18            46        62.00    1        0   
1438   62          121    30.84     36.00         0.02         0.00         0.03        9.54            60        22.00    0        0   
3346   70           10    24.97     35.00         0.02         0.00         0.04       10.18            46        45.00    1        0   
2196   77           44    20.81     34.00         0.06         0.00         0.04       11.29            39        50.00    1        0   
59     55            7    25.96     36.00         0.02         0.01         0.02        6.40            51        56.30    0        0   
      hx_arrhythmia  hx_cabg  hx_cad  hx_cancer  rf_tobacco  rf_diab  rf_dyslip  rf_dialysis  rf_htn  admreasnb  rf_copd  vit_nyha  vit_angccs  
20                0        0       0          0           0        1          0            0       1          0        0         2           0  
1438              0        0       0          0           0        1          0            0       1          9        2         4           0  
3346              0        0       0          0           0        1          0            0       1          2        0         2           0

和 y_train 数组

array([[-1],
       [-1],
       [-1],
       ...,
       [-1],
       [-1],
       [-1]])

希望能帮到你

Answer 1

权重矩阵 #1 应该是 number_of_features_plus_1-Dim x number_of_outputs_to_classify-Dim，权重矩阵 #2 应该是 number_of_features-Dim x number_of_outputs_to_classify-Dim。此链接详细介绍了 (https://ml-cheatsheet.readthedocs.io/en/latest/forwardpropagation.html)。下面是我如何手动初始化权重矩阵，注意我在两个矩阵的顶部插入了一个偏置层：

def init_weights(self):
    _w1 = np.zeros((np.shape(self.x)[1], np.shape(self.y)[1]), dtype=float)
    for row in range(np.shape(_w1)[0]):
        for col in range(np.shape(_w1)[1]):
            _w1[row, col] = np.random.random(1)*0.1
    _w2 = np.zeros((np.shape(self.y)[1], np.shape(self.y)[1]), dtype=float)
    for row in range(np.shape(_w2)[0]):
        for col in range(np.shape(_w2)[1]):
            _w2[row, col] = np.random.random(1)*0.1
    b = np.zeros(np.shape(_w1)[1]).astype(float)
    b[0] = 1.0
    return np.vstack((b, _w1)), np.vstack((b, _w2))

Iris 数据集的整个示例：

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data', header=None)
df.tail()
_y = df.iloc[0:150, 4].values
y = []
for i in range(len(_y)):
    if _y[i].endswith('setosa'):
        y.append([1.0, 0.0, 0.0])
    elif _y[i].endswith('versicolor'):
        y.append([0.0, 1.0, 0.0])
    elif _y[i].endswith('virginica'):
        y.append([0.0, 0.0, 1.0])
x = df.iloc[0:150, :4].values

def softmax(x):
    e_x = np.exp(x.astype(float) - np.max(x.astype(float)))
    return e_x / e_x.sum()

class NN:
    def __init__(self, x, y):
        self.x = x
        self.y = y
        self.alpha = 0.3
        self.w1, self.w2 = self.init_weights()
        self.error_history = []
    
    def init_weights(self):
        _w1 = np.zeros((np.shape(self.x)[1], np.shape(self.y)[1]), dtype=float)
        for row in range(np.shape(_w1)[0]):
            for col in range(np.shape(_w1)[1]):
                _w1[row, col] = np.random.random(1)*0.1
        _w2 = np.zeros((np.shape(self.y)[1], np.shape(self.y)[1]), dtype=float)
        for row in range(np.shape(_w2)[0]):
            for col in range(np.shape(_w2)[1]):
                _w2[row, col] = np.random.random(1)*0.1
        b = np.zeros(np.shape(_w1)[1]).astype(float)
        b[0] = 1.0
        return np.vstack((b, _w1)), np.vstack((b, _w2))

    def train(self, n_iterations):
        for n in range(n_iterations):
            current_sample = np.random.randint(len(self.y))
            _inp = [self.x[current_sample, 0], self.x[current_sample, 1], self.x[current_sample, 2], self.x[current_sample, 3], 1.0]
            hid_inp = self.w1*np.reshape(_inp, [5, 1])
            hid_act = softmax(hid_inp)
            hid_act = np.sum(hid_act, axis=0)
            hid_act = np.reshape(np.array([hid_act[0], hid_act[1], hid_act[2], 1.0]), [4, 1])
            out_inp = hid_act*self.w2
            out_act = softmax(out_inp)
            err = self.y[current_sample] - np.sum(out_inp, axis=0)
            self.error_history.append(sum(err))
            dE_dW2 = err*out_act*(1.0 - out_act)*np.reshape([hid_act[0], hid_act[1], hid_act[2], 1.0], [4, 1])
            hid_err = (self.w2*err*np.reshape(hid_act, [4, 1])*np.reshape([1.0 - i for i in hid_act], [4, 1])).transpose()
            hid_err = np.sum(hid_err, axis=0)
            dE_dW1 = hid_err[:-1]*np.reshape(_inp, [5, 1])
            self.w1 = self.w1 + dE_dW1*self.alpha
            self.w2 = self.w2 + dE_dW2*self.alpha
        return self

m = NN(x, y)
m.train(500)

fig = plt.figure(figsize=(5, 5))
ax = plt.subplot(1, 1, 1)
ax.plot(np.arange(0, len(m.error_history), 1), m.error_history, ls='-', lw=1.5, color=[1, 0, 0])
ax.set_ylabel('Error')
ax.set_xlabel('Samples')
[ax.tick_params(axis=_axis, which='both', bottom='on', top=False, color='grey', labelcolor='grey') for _axis in ['x', 'y']]
[ax.spines[axis].set_visible(False) for axis in ["top", "right", "bottom", "left"]]
plt.show()

输出：

Python 中的 Adaline 简单层神经网络类和 ValueError：操作数无法与形状一起广播

问题描述投票：0回答：1

1个回答

最新问题

Python 中的 Adaline 简单层神经网络类和 ValueError：操作数无法与形状一起广播

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1