张量流非线性 Apple M4 芯片的问题

我对张量流中非线性(S形)神经网络分类的结果有疑问。我怀疑是 M 芯片和我的安装有问题,但我使用 miniforge、miniconda 和 conda 尝试了几个版本。 我还在我的conda环境中安装了

tensorflow-macos and tensorflow-metal


  • Macbook pro Apple M4 芯片 Sequoia 15.1.1
  • Python平台:macOS-15.1.1-arm64-arm-64bit
  • 张量流 v:2.16.2
  • Keras 版本: 3.7.0
  • Python 3.10.16 |由 conda-forge 打包 | (主要,2024年12月5日,14:20:01)[Clang 18.1.8]
  • 熊猫2.2.3
  • Sckikit 学习 1.5.2
  • GPU可用

我在本地 Jupyter 和 Colab google 中运行相同的代码,但结果不同。

Classification from Collab Google

Classification from my local jupyter


def plot_decision_boundary(model, X, y):
  Plots the decision boundary created by a model predicting on X.
  This function has been adapted from two phenomenal resources:
   1. CS231n - https://cs231n.github.io/neural-networks-case-study/
   2. Made with ML basics - https://github.com/GokuMohandas/MadeWithML/blob/main/notebooks/08_Neural_Networks.ipynb
  # Define the axis boundaries of the plot and create a meshgrid
  x_min, x_max = X[:, 0].min() - 0.1, X[:, 0].max() + 0.1
  y_min, y_max = X[:, 1].min() - 0.1, X[:, 1].max() + 0.1
  xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),
                       np.linspace(y_min, y_max, 100))

  # Create X values (we're going to predict on all of these)
  x_in = np.c_[xx.ravel(), yy.ravel()] # stack 2D arrays together: https://numpy.org/devdocs/reference/generated/numpy.c_.html

  # Make predictions using the trained model
  y_pred = model.predict(x_in)

  # Check for multi-class
  if model.output_shape[-1] > 1: # checks the final dimension of the model's output shape, if this is > (greater than) 1, it's multi-class
    print("doing multiclass classification...")
    # We have to reshape our predictions to get them ready for plotting
    y_pred = np.argmax(y_pred, axis=1).reshape(xx.shape)
    print("doing binary classifcation...")
    y_pred = np.round(np.max(y_pred, axis=1)).reshape(xx.shape)

  # Plot decision boundary
  plt.contourf(xx, yy, y_pred, cmap=plt.cm.RdYlBu, alpha=0.7)
  plt.scatter(X[:, 0], X[:, 1], c=y, s=40, cmap=plt.cm.RdYlBu)
  plt.xlim(xx.min(), xx.max())
  plt.ylim(yy.min(), yy.max())



model3 = tf.keras.Sequential([
    tf.keras.layers.Dense(12, activation="relu"),
    tf.keras.layers.Dense(8, activation="relu"),
    tf.keras.layers.Dense(1, activation="sigmoid")

    loss = tf.keras.losses.BinaryCrossentropy(),
    optimizer= tf.keras.optimizers.Adam(learning_rate=0.01),
    metrics = ["accuracy"]

history = model3.fit(X, y, epochs=100)
通过将隐藏层从 relu 更改为 sigmoid,您可以确保每个层在整个输入范围内应用非线性变换。使用relu,模型有可能进入大部分神经元线性激发的状态(例如,如果值都在正区域,则relu基本上表现得像恒等函数)。这可能会导致模型在实践中几乎呈线性表现,特别是当权重初始化和数据分布导致 relu 线性区域中的神经元饱和时。

相比之下,sigmoid 总是引入曲率(非线性),将输出值压缩到 0 到 1 之间的范围。这使得网络很难在线性行为上停滞,因为即使权重发生细微变化,sigmoid函数维持输入和输出之间的非线性映射。

