我对张量流中非线性(S形)神经网络分类的结果有疑问。我怀疑是 M 芯片和我的安装有问题,但我使用 miniforge、miniconda 和 conda 尝试了几个版本。 我还在我的conda环境中安装了
tensorflow-macos and tensorflow-metal
我检查我的需求系统:
我在本地 Jupyter 和 Colab google 中运行相同的代码,但结果不同。
我的Python可视化函数是这样的。如您所见,该功能来自课程。其他用户没有遇到该功能的问题。
def plot_decision_boundary(model, X, y):
"""
Plots the decision boundary created by a model predicting on X.
This function has been adapted from two phenomenal resources:
1. CS231n - https://cs231n.github.io/neural-networks-case-study/
2. Made with ML basics - https://github.com/GokuMohandas/MadeWithML/blob/main/notebooks/08_Neural_Networks.ipynb
"""
# Define the axis boundaries of the plot and create a meshgrid
x_min, x_max = X[:, 0].min() - 0.1, X[:, 0].max() + 0.1
y_min, y_max = X[:, 1].min() - 0.1, X[:, 1].max() + 0.1
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),
np.linspace(y_min, y_max, 100))
# Create X values (we're going to predict on all of these)
x_in = np.c_[xx.ravel(), yy.ravel()] # stack 2D arrays together: https://numpy.org/devdocs/reference/generated/numpy.c_.html
# Make predictions using the trained model
y_pred = model.predict(x_in)
# Check for multi-class
if model.output_shape[-1] > 1: # checks the final dimension of the model's output shape, if this is > (greater than) 1, it's multi-class
print("doing multiclass classification...")
# We have to reshape our predictions to get them ready for plotting
y_pred = np.argmax(y_pred, axis=1).reshape(xx.shape)
else:
print("doing binary classifcation...")
y_pred = np.round(np.max(y_pred, axis=1)).reshape(xx.shape)
# Plot decision boundary
plt.contourf(xx, yy, y_pred, cmap=plt.cm.RdYlBu, alpha=0.7)
plt.scatter(X[:, 0], X[:, 1], c=y, s=40, cmap=plt.cm.RdYlBu)
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
我的神经网络代码是这样的:
tf.random.set_seed(42)
model3 = tf.keras.Sequential([
tf.keras.layers.Dense(12, activation="relu"),
tf.keras.layers.Dense(8, activation="relu"),
tf.keras.layers.Dense(1, activation="sigmoid")
])
model3.compile(
loss = tf.keras.losses.BinaryCrossentropy(),
optimizer= tf.keras.optimizers.Adam(learning_rate=0.01),
metrics = ["accuracy"]
)
history = model3.fit(X, y, epochs=100)
通过将隐藏层从 relu 更改为 sigmoid,您可以确保每个层在整个输入范围内应用非线性变换。使用relu,模型有可能进入大部分神经元线性激发的状态(例如,如果值都在正区域,则relu基本上表现得像恒等函数)。这可能会导致模型在实践中几乎呈线性表现,特别是当权重初始化和数据分布导致 relu 线性区域中的神经元饱和时。
相比之下,sigmoid 总是引入曲率(非线性),将输出值压缩到 0 到 1 之间的范围。这使得网络很难在线性行为上停滞,因为即使权重发生细微变化,sigmoid函数维持输入和输出之间的非线性映射。