张量流非线性 Apple M4 芯片的问题

Question

我对张量流中非线性（S形）神经网络分类的结果有疑问。我怀疑是 M 芯片和我的安装有问题，但我使用 miniforge、miniconda 和 conda 尝试了几个版本。我还在我的conda环境中安装了

tensorflow-macos and tensorflow-metal

我检查我的需求系统：

Macbook pro Apple M4 芯片 Sequoia 15.1.1
Python平台：macOS-15.1.1-arm64-arm-64bit
张量流 v：2.16.2
Keras 版本： 3.7.0
Python 3.10.16 |由 conda-forge 打包 | （主要，2024年12月5日，14:20:01）[Clang 18.1.8]
熊猫2.2.3
Sckikit 学习 1.5.2
GPU可用

我在本地 Jupyter 和 Colab google 中运行相同的代码，但结果不同。

我的Python可视化函数是这样的。如您所见，该功能来自课程。其他用户没有遇到该功能的问题。

def plot_decision_boundary(model, X, y):
  """
  Plots the decision boundary created by a model predicting on X.
  This function has been adapted from two phenomenal resources:
   1. CS231n - https://cs231n.github.io/neural-networks-case-study/
   2. Made with ML basics - https://github.com/GokuMohandas/MadeWithML/blob/main/notebooks/08_Neural_Networks.ipynb
  """
  # Define the axis boundaries of the plot and create a meshgrid
  x_min, x_max = X[:, 0].min() - 0.1, X[:, 0].max() + 0.1
  y_min, y_max = X[:, 1].min() - 0.1, X[:, 1].max() + 0.1
  xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),
                       np.linspace(y_min, y_max, 100))

  # Create X values (we're going to predict on all of these)
  x_in = np.c_[xx.ravel(), yy.ravel()] # stack 2D arrays together: https://numpy.org/devdocs/reference/generated/numpy.c_.html

  # Make predictions using the trained model
  y_pred = model.predict(x_in)

  # Check for multi-class
  if model.output_shape[-1] > 1: # checks the final dimension of the model's output shape, if this is > (greater than) 1, it's multi-class
    print("doing multiclass classification...")
    # We have to reshape our predictions to get them ready for plotting
    y_pred = np.argmax(y_pred, axis=1).reshape(xx.shape)
  else:
    print("doing binary classifcation...")
    y_pred = np.round(np.max(y_pred, axis=1)).reshape(xx.shape)

  # Plot decision boundary
  plt.contourf(xx, yy, y_pred, cmap=plt.cm.RdYlBu, alpha=0.7)
  plt.scatter(X[:, 0], X[:, 1], c=y, s=40, cmap=plt.cm.RdYlBu)
  plt.xlim(xx.min(), xx.max())
  plt.ylim(yy.min(), yy.max())

我的神经网络代码是这样的：

tf.random.set_seed(42)

model3 = tf.keras.Sequential([
    tf.keras.layers.Dense(12, activation="relu"),
    tf.keras.layers.Dense(8, activation="relu"),
    tf.keras.layers.Dense(1, activation="sigmoid")
])

model3.compile(
    loss = tf.keras.losses.BinaryCrossentropy(),
    optimizer= tf.keras.optimizers.Adam(learning_rate=0.01),
    metrics = ["accuracy"]
)

history = model3.fit(X, y, epochs=100)

Answer 1

通过将隐藏层从 relu 更改为 sigmoid，您可以确保每个层在整个输入范围内应用非线性变换。使用relu，模型有可能进入大部分神经元线性激发的状态（例如，如果值都在正区域，则relu基本上表现得像恒等函数）。这可能会导致模型在实践中几乎呈线性表现，特别是当权重初始化和数据分布导致 relu 线性区域中的神经元饱和时。

相比之下，sigmoid 总是引入曲率（非线性），将输出值压缩到 0 到 1 之间的范围。这使得网络很难在线性行为上停滞，因为即使权重发生细微变化，sigmoid函数维持输入和输出之间的非线性映射。

张量流非线性 Apple M4 芯片的问题

问题描述投票：0回答：1

1个回答

最新问题

张量流非线性 Apple M4 芯片的问题

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1