为什么我的线性回归得分这么低？

Question

请刮一下Pgatour.com网站上的统计信息，该网站会定期更新统计信息。他们拥有可追溯到2010年的历史数据，这些数据已被抓取到平坦的csv文件中。使用pd.read_csv（filename）成功读取该内容。此刮擦时间为2010-2019。

SG_P SG_T SG_TTG SG_OTT SG_ATG要点7 0.243 1.195 0.952 0.338 0.168 718.08 0.098 1.192 1.091 0.724 0.260 445.09 -0.147 1.001 1.151 0.185 0.738 843.011 0.054 0.984 0.927 0.151 0.507 718.012 0.137 1.156 1.014 0.403 0.642 500.0

[在重新建立新数据框后，我仅保留具有非线性关系的'SG'统计信息或'Strokes-Gained'，以告知高尔夫球手的'Points'，我们为测试数据运行了0.33％的train_test_split。目标变量是“ POINTS”。

在其他Kaggle runs of this project中，结果通常在.70精度范围内。

对于来自Scikitlearn的直线回归，我的数据在.25-.30范围内，这产生了非常不足的数据，当与Seaborn一起绘制时，结果很差。

training set r^2 score = 0.2601442196444287
testing set r^2 score = 0.2602966900574226

线性回归代码如下：

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

# Split features and target

X = df2.iloc[: ,:-1] # Get the features minus OWGR which is the target
y = df2.iloc[:,-1:]  # Just get the target

# Train test split

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.33,random_state=12)

lr = LinearRegression(n_jobs = -1,normalize=True)
lr.fit(X_train, y_train)

training set r^2 score = 0.2601442196444287
testing set r^2 score = 0.2602966900574226

这里是多边形版本：

from sklearn.preprocessing import PolynomialFeatures

degree = 2 # Start with 2
poly = PolynomialFeatures(degree, include_bias=False)

X_poly = poly.fit_transform(X) # No longer a pandas dataframe
y_poly = y # Still a pandas dataframe

X_poly_train, X_poly_test, y_poly_train, y_poly_test = train_test_split(X_poly, y_poly, random_state=12)

lr_poly = LinearRegression()
lr_poly.fit(X_poly_train, y_poly_train)

training set r^2 score = 0.2902297270799855
testing set r^2 score = 0.1746156333412796

在Kaggle的基准笔记本中，我看到了这样的结果：

线性：

training set r^2 score = 0.5540673510136147
testing set r^2 score = 0.510807136771844

Poly：

training set r^2 score = 0.7466513181026075
testing set r^2 score = 0.6325248963195537

Answer 1

数据预处理是培训中非常重要的前一步。在此步骤中，对数据进行转换以减少不相关和冗余的信息，处理无效值等。有关数据预处理的某些操作可能包括空值处理，数据维数减少，数据标准化等。如果不执行此过程，则可能最终拥有功能强大且经过精心设计的分类器，该分类器对实际数据的效果不佳。试试看，您的结果可能会改善！

为什么我的线性回归得分这么低？

问题描述投票：0回答：1

1个回答

最新问题

为什么我的线性回归得分这么低？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1