我正试图解决以下问题,但我得到一个错误。
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics.regression import r2_score
import numpy as np
degrees = np.arange(0, 9)
np.random.seed(0)
n = 15
x = np.linspace(0,10,n) + np.random.randn(n)/5
y = np.sin(x)+x/6 + np.random.randn(n)/10
for i in degrees:
poly = PolynomialFeatures(i)
x_poly = poly.fit_transform(x)
X_train, X_test, y_train, y_test = train_test_split(x_poly, y, random_state = 0)
linreg = LinearRegression().fit(X_train, y_train)
r2_train = linreg.r2_score(X_train, y_train)
r2_test = linreg.r2_train(X_test, y_test)
发现输入变量的样本数不一致。[1, 15]
有什么原因导致我出现上述错误。
x
变成 2D numpy 阵列 使用 x.reshape(-1,1)
.linreg.r2_score
是 无效. 另外,不需要使用 r2_score
. 只是 使用 linreg.score
. 这将返回确定系数 R^2 的预测(参考).degree
r2_score be 0
所以用 PolynomialFeatures(i+1)
圈内 除了 如果你真的打算使用0度多项式展开。请记住,如果一个输入样本是二维的,并且是 [a, b] 的形式,那么 2 度多项式的特征是 [1, a, b, a^2, ab, b^2]。完整的工作实例。
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics.regression import r2_score
import numpy as np
from sklearn.model_selection import train_test_split
degrees = np.arange(0, 9)
np.random.seed(0)
n = 15
x = np.linspace(0,10,n) + np.random.randn(n)/5
y = np.sin(x)+x/6 + np.random.randn(n)/10
for i in degrees:
poly = PolynomialFeatures(i+1)
x_poly = poly.fit_transform(x.reshape(-1,1))
X_train, X_test, y_train, y_test = train_test_split(x_poly, y, random_state = 0)
linreg = LinearRegression().fit(X_train, y_train)
r2_train = linreg.score(X_train, y_train)
r2_test = linreg.score(X_test, y_test)
你没有对x进行重塑. x的形状应该是 (n_samples, n_features). 而且linreg.r2_score也没有了。我修改了下面的代码。
degrees = np.arange(0, 9)
np.random.seed(0)
n = 15
x = np.linspace(0,10,n) + np.random.randn(n)/5
y = np.sin(x)+x/6 + np.random.randn(n)/10
x = x.reshape(-1, 1)
for i in degrees:
poly = PolynomialFeatures(i)
x_poly = poly.fit_transform(x)
X_train, X_test, y_train, y_test = train_test_split(x_poly, y, random_state = 0)
linreg = LinearRegression().fit(X_train, y_train)
r2_train = linreg.score(X_train, y_train)
r2_test = linreg.score(X_test, y_test)
你的代码有很多错误和错别字. 如果你能先练习一些已知的问题,比如iris,房价回归问题等,会很有用。
正确的代码 。
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics.regression import r2_score
from sklearn.model_selection import train_test_split
import numpy as np
degrees = np.arange(0, 9)
np.random.seed(0)
n = 15
x = np.linspace(0,10,n) + np.random.randn(n)/5
y = np.sin(x)+x/6 + np.random.randn(n)/10
#### convert x into 2D matrix #####
x= x.reshape(-1,1)
i=1
for i in degrees:
poly = PolynomialFeatures(i)
x_poly = poly.fit_transform(x)
X_train, X_test, y_train, y_test = train_test_split(x_poly, y, random_state = 0)
linreg = LinearRegression().fit(X_train, y_train)
r2_train = r2_score(y_train,linreg.predict(X_train))
r2_test = r2_score(y_test ,linreg.predict(X_test))
#### linreg.score(X_train, y_train) can also used to calculate r2_score