我有一个包含以下变量的数据集:
我的目标是根据这些变量预测房价。是否有可能以任何明智的方式预测未来价格将如何变化?我最初的想法是计算以前所有事件的平均值,并预测将是这样。但这似乎并不准确。然后,我将使用LinearRegression类的对象使用截距和系数来手动计算值。
但是它没有考虑到当年的任何数据(如果一直高于平均水平,那么猜测可能会稍高一些。)>
是否有用于预测房价的特定公式?
import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt %matplotlib inline from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_absolute_error from sklearn.metrics import mean_squared_error from sklearn.metrics import r2_score import sys np.set_printoptions(threshold=sys.maxsize) import warnings warnings.filterwarnings('ignore') df = pd.read_excel('houses.xlsx', encoding='utf-8') # No data cleaning to do new_df = pd.get_dummies(df) new_df['Price 2019 - 2020'] = np.zeros(len(new_df)) X = new_df[new_df.columns.difference(['Price 2018 - 2019'])] y = new_df["Price 2018 - 2019"] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 50) reg = LinearRegression().fit(X_train, y_train) sklearn_train_r2 = reg.score(X_train, y_train) sklearn_test_r2 = reg.score(X_test, y_test) # Predict on the test data y_test = np.array(y_test) sklearn_pred = reg.predict(X_test) sklearn_mae = mean_absolute_error(y_test, sklearn_pred.reshape(-1)) sklearn_rmse = np.sqrt(mean_squared_error(y_test, sklearn_pred.reshape(-1))) sklearn_train_accuracy = reg.score(X_train, y_train) sklearn_test_accuracy = reg.score(X_test, y_test) pd.DataFrame([[sklearn_train_accuracy], [sklearn_test_accuracy], [sklearn_train_r2], [sklearn_mae], [sklearn_rmse]], ['Train Accuracy', 'Test Accuracy', 'Train R2', 'MAE', 'RMSE'], \ ['Scikit-learn']) #For retrieving the slope: print(reg.intercept_) #For retrieving the coefficients: coeff_df = pd.DataFrame(reg.coef_, X.columns, columns=['Coefficient']) coeff_df
每个洞察力都是有帮助的。
我有一个包含以下变量的数据集:房间数量邻里2018-2019年的价格2017-2018年的价格2016-2017年的价格2016-2017年的价格2015-2016年的价格有家具还是无建筑时代...