我正在做第一个关于预测NBA球员薪水的数据科学项目。但是,我为数据使用了两个模型,并且我的准确性得分非常低。谁能帮助我提高准确性得分?谢谢
用于线性回归的r2_score:0.5836029556187516
r2_score for random forest regressor:0.6287935547320641
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import r2_score
from sklearn import metrics
import math
df = pd.read_csv('nba_eda.csv')
df_model = df[['CurrentSalary', 'PTS', 'MP', 'Age', 'G','WS', 'STL', 'TRB', 'AST', 'BLK', 'TOV']]
df_test = pd.get_dummies(df_model)
from sklearn.model_selection import train_test_split
X = df_test.drop('CurrentSalary', axis=1)
Y = df_test.CurrentSalary.values
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size = 0.3, random_state = 0)
regressor = LinearRegression()
regressor.fit(x_train, y_train)
regressor.score(x_train,y_train)
y_pred = regressor.predict(x_test)
accuracy = r2_score(y_test, y_pred)
lr = RandomForestRegressor(n_estimators=100)
lr.fit(x_train, y_train)
prediction = lr.predict(x_test)
acc = r2_score(y_test, prediction)
我认为尝试更复杂的模型可能会提高准确性,也许您可以尝试多项式回归这是一个代码示例:
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
pf = PolynomialFeatures(degree = 2)
X_polynomial = pf.fit_transform(X)
linModel = LinearRegression()
linModel.fit(X_polynomial, y)
您也可以尝试增加多项式特征的次数。