我已经根据此.csv文件中的可用数据训练了多元线性回归模型:https://www.kaggle.com/dgomonov/new-york-city-airbnb-open-data/downloads/new-york-city-airbnb-open-data.zip/3我这样训练它(梯度下降):
alpha = 0.1
rd: int = 0
while epoch <= 4000:
rd = 0
epoch += 1
print("Epoch: "+str(epoch))
while rd <= 48894:
expected_y: float = x1a[rd] * w[0]
expected_y += x1b[rd] * w[1]
expected_y += x1c[rd] * w[2]
expected_y += x1d[rd] * w[3]
expected_y += x1e[rd] * w[4]
expected_y += x2p[rd] * w[5]
expected_y += x2e[rd] * w[6]
expected_y += x2s[rd] * w[7]
expected_y += x3[rd] * w[8]
expected_y += x4[rd] * w[9]
expected_y += x5[rd] * w[10]
expected_y += x6[rd] * w[11]
expected_y += x7[rd] * w[12] + b
actual_y: float = y[rd]
disparity: float = expected_y - actual_y
b -= 2 * alpha * disparity * (1 / 48894)
w[0] -= 2 * alpha * x1a[rd] * disparity * (1 / 48894)
w[1] -= 2 * alpha * x1b[rd] * disparity * (1 / 48894)
w[2] -= 2 * alpha * x1c[rd] * disparity * (1 / 48894)
w[3] -= 2 * alpha * x1d[rd] * disparity * (1 / 48894)
w[4] -= 2 * alpha * x1e[rd] * disparity * (1 / 48894)
w[5] -= 2 * alpha * x2p[rd] * disparity * (1 / 48894)
w[6] -= 2 * alpha * x2e[rd] * disparity * (1 / 48894)
w[7] -= 2 * alpha * x2s[rd] * disparity * (1 / 48894)
w[8] -= 2 * alpha * x3[rd] * disparity * (1 / 48894)
w[9] -= 2 * alpha * x4[rd] * disparity * (1 / 48894)
w[10] -= 2 * alpha * x5[rd] * disparity * (1 / 48894)
w[11] -= 2 * alpha * x6[rd] * disparity * (1 / 48894)
w[12] -= 2 * alpha * x7[rd] * disparity * (1 / 48894)
rd += 1
if epoch % 2 == 0:
te = 0
mean_squared_error: float = 0
while te <= 48894:
expected_y = x1a[te] * w[0] + x1b[te] * w[1] + x1c[te] * w[2] + x1d[te] * w[3] + x1e[te] * w[4]
expected_y += x2p[te] * w[5]
expected_y += x2e[te] * w[6] + x2s[te] * w[7] + x3[te] * w[8] + x4[te] * w[9] + x5[te] * w[10]
expected_y += x6[te] * w[11]
expected_y += x7[te] * w[12] + b
actual_y = y[te]
disparity = expected_y - actual_y
mean_squared_error += disparity ** 2
te += 1
print("\t\tEpoch: " + str(epoch) + "\n\t\tMSE:" + str(mean_squared_error))
损失/成本函数(均方误差)一直按预期减少,直到第712个时期,然后开始增加(尽管缓慢)。这是什么意思,以及如何防止它发生?
[当您尝试使用较大的学习率(alpha
)时发生,这意味着您的模型在梯度下降中迈出了一大步,以求出全局最小值,并且可能在迭代次数减少后成本函数会增加并超过整体最小值,无法收敛甚至发散。因此,在您的情况下,可能采取一些小步骤(降低学习率alpha
)将确保梯度下降会收敛到全局最优值。
这是因为您的目标是最小化成本函数theta。如果您使用梯度下降,则需要确保学习率alpha不大。您的假设应该以全局最小值而不是局部最优值结束。梯度下降应该采取小步骤来找到最适合训练模型的全局最小值。