手动构建的梯度下降算法有什么问题？

Question

我是数据科学和机器学习的学习者。我没有使用内置的python库就线性回归成本函数的梯度下降优化编写了代码。但是，只是为了确认我的代码是否正确并验证结果，我还使用内置的python库实现了相同的功能。我通过代码获得的系数和截距值与使用内置python模块获得的系数和截距值不匹配。请提出我的线性回归梯度下降优化方式的错误是什么？

我的方法：

import pandas as pd
import numpy as np
import seaborn as sb
import matplotlib.pyplot as plt
from sklearn.linear_model import SGDRegressor

Data=pd.DataFrame({'X': list(np.arange(0,10,1)), 'Y': [1,3,2,5,7,8,8,9,10,12]})
Data.head()

sb.scatterplot(x ='X', y = 'Y', data = Data)
plt.show()

#generating column of ones
X0 = np.ones(len(Data)).reshape(-1,1)
#print(X0.shape)

X = Data.drop(['Y'], axis = 1).values
X_new = np.concatenate((X0,X), axis = 1)
#print(X_new)
#print(X_new.shape)

Y = Data.loc[:,['Y']].values
#print(Y)
#print(Y.shape)

# initial theta
theta =np.random.randint(low=0, high=1, size= X_new.shape[1]).reshape(-1,1)
#print(theta.shape)

J_history = []
theta_history = [list(theta.flatten())]

#gradient descent implementation
iterations = 1000
alpha = 0.01
m = len(Y)
for iter in range(1,iterations):
    H = X_new.dot(theta)
    loss = (H-Y)
    J = loss/(2*m)
    J_history.append(J)
    G = X_new.T.dot(loss)/m
    theta_new = theta - alpha*G    
    theta_history.append(list(theta_new.flatten()))
    theta = theta_new

# collecting costs (J) and coefficients (theta_0,theta_1)

theta_history.pop()
J_history = [i[0] for i in J_history]

params = pd.DataFrame()
params['J']=J_history

for i in range(len(theta_history[0])):
    params['theta_'+str(i)]=[k[i] for k in theta_history]

idx = params[params['J']==min(params['J'])].index
values = params.iloc[idx[0]][1:params.shape[1]].tolist()
print('intercept: {}, coeff: {}'.format(values[0],values[1]))

使用内置库：

import pandas as pd
import numpy as np
import seaborn as sb
import matplotlib.pyplot as plt
from sklearn.linear_model import SGDRegressor

Data=pd.DataFrame({'X': list(np.arange(0,10,1)), 'Y': [1,3,2,5,7,8,8,9,10,12]})
Data.head()

sb.scatterplot(x ='X', y = 'Y', data = Data)
plt.show()
model = SGDRegressor(loss = 'squared_loss', learning_rate = 'constant', eta0 = 0.01, max_iter= 1000)
model.fit(Data['X'].values.reshape(-1,1), Data['Y'].values.reshape(-1,1))
print('coeff: {}, intercept: {}'.format(model.coef_, model.intercept_))

Answer 1

首先，我感谢您为自己理解和实现SGD算法所做的努力。

现在，返回您的代码。有一些小错误需要纠正：

您的J不是标量，而是numpy.array，但是您使用它们的方式意味着它们被假定为标量，因此在执行代码时会引发错误。
运行链后，您必须选择误差最小的，并且该误差实际上是J ^ 2而不是J，因为J可能为负好。顾名思义，实际上您正在使用的scikit学习
SGDRegressor

”是随机的，并且由于您的数据集很小，您需要多次运行它并取其平均值，如果想要得到的话可靠的东西。

您的学习率
0.01
似乎有点大

进行这些更改后，我从您的代码中得到与[[SGDRegressor

相当的结果”。

import pandas as pd import numpy as np import seaborn as sb import matplotlib.pyplot as plt from sklearn.linear_model import SGDRegressor Data=pd.DataFrame({'X': list(np.arange(0,10,1)), 'Y': [1,3,2,5,7,8,8,9,10,12]}) Data.head() sb.scatterplot(x ='X', y = 'Y', data = Data) plt.show() #generating column of ones X0 = np.ones(len(Data)).reshape(-1,1) #print(X0.shape) X = Data.drop(['Y'], axis = 1).values X_new = np.concatenate((X0,X), axis = 1) #print(X_new) #print(X_new.shape) Y = Data.loc[:,['Y']].values #print(Y) #print(Y.shape) # initial theta theta =np.random.randint(low=0, high=1, size= X_new.shape[1]).reshape(-1,1) #print(theta.shape) J_history = [] theta_history = [list(theta.flatten())] #gradient descent implementation iterations = 2000 alpha = 0.001 m = len(Y) for iter in range(1,iterations): H = X_new.dot(theta) loss = (H-Y) J = loss/(2*m) J_history.append(J[0]**2) G = X_new.T.dot(loss)/m theta_new = theta - alpha*G theta_history.append(list(theta_new.flatten())) theta = theta_new theta_history.pop() J_history = [i[0] for i in J_history] # collecting costs (J) and coefficients (theta_0,theta_1) params = pd.DataFrame() params['J']=J_history for i in range(len(theta_history[0])): params['theta_'+str(i)]=[k[i] for k in theta_history] idx = params[params['J']== params['J'].min()].index values = params.iloc[idx[0]][1:params.shape[1]].tolist() print('intercept: {}, coeff: {}'.format(values[0],values[1])) #> intercept: 0.654041555750147, coeff: 1.2625626277290982

现在让我们看看scikit学习模型
from sklearn.linear_model import SGDRegressor intercepts = [] coefs = [] for _ in range(500): model = SGDRegressor(loss = 'squared_loss', learning_rate = 'constant', eta0 = 0.01, max_iter= 1000) model.fit(Data['X'].values.reshape(-1,1), Data['Y'].values.reshape(-1)) intercepts.append(model.intercept_) coefs.append(model.coef_) intercept = np.concatenate(intercepts).mean() coef = np.vstack(coefs).mean(0) print('intercept: {}, coeff: {}'.format( intercept, coef)) #> intercept: 0.6912403374422401, coeff: [1.24932246]

手动构建的梯度下降算法有什么问题？

问题描述投票：0回答：1

1个回答

最新问题

手动构建的梯度下降算法有什么问题？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1