为什么我的sklearn线性回归模型会产生完美的预测?

问题描述 投票:0回答:1

我正在尝试使用sklearn进行多元线性回归,并且已经执行了以下步骤。但是,使用训练好的模型预测y_pred时,我得到的是完美的r ^ 2 = 1.0。有谁知道为什么会这样/我的代码出了什么问题?

也很抱歉,我是这个网站的新手,所以我不完全了解问题的格式/礼节!

import numpy as np
import pandas as pd

# Import and subset data
ml_data_all = pd.read_excel('C:/Users/User/Documents/RSEM/STADM/Coursework/Crime_SF/Machine_learning_collated_data.xlsx')
ml_data_1218 = ml_data_all[ml_data_all['Year'] >= 2012]

ml_data_1218.drop(columns=['Pop_MOE',
                               'Pop_density_MOE',
                                'Age_median_MOE',
                               'Sex_ratio_MOE',
                                'Income_median_household_MOE',
                               'Pop_total_pov_status_determ_MOE',
                                'Pop_total_50percent_pov_MOE',
                                'Pop_total_125percent_pov_MOE',
                               'Poverty_percent_below_MOE',
                                'Total_labourforceMOE',
                               'Unemployed_total_MOE',
                               'Unemployed_total_male_MOE'], inplace=True)

# Taking care of missing data
# Delete rows containing any NaNs
ml_data_1218.dropna(axis=0,
                   how='any',
                   inplace=True)

# DATA PREPROCESSING

# Defining X and y
X = ml_data_1218.drop(columns=['Year']).values
y = ml_data_1218['Burglaries '].values

# Encoding categorical data 
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer

transformer = ColumnTransformer(transformers=[("cat", OneHotEncoder(), [0])], remainder='passthrough')
X = transformer.fit_transform(X)
X.toarray()
X = pd.DataFrame.sparse.from_spmatrix(X)

# Split into Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

# Feature scaling 
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_train.iloc[:,149:] = sc_X.fit_transform(X_train.iloc[:,149:])
X_test.iloc[:,149:] = sc_X.transform(X_test.iloc[:,149:])

# Fitting multiple linear regression to training set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

# Predicting test set results
y_pred = regressor.predict(X_test)

from sklearn.metrics import r2_score
r2_score(y_test, y_pred)
python-3.x scikit-learn linear-regression
1个回答
0
投票

因此,最终这是一个愚蠢的错误:我忘了从X列中删除因变量(Burglaries)哈哈,因此为什么线性回归模型可以做出完美的预测。现在它正在工作(r2 = 0.56)。谢谢大家!

© www.soinside.com 2019 - 2024. All rights reserved.