我已经开发了一种用于多元数据的机器学习算法,需要帮助来理解如何提高预测的准确性。
我浏览了教程和其他Stack Overflow帖子,以了解是否有人具有相似的代码/问题,并且无法找到与我的代码足够接近的东西来判断我是否拥有最准确的算法。
而且,我最近才开始研究机器学习及其在实际使用中的实现方式,因此,如果我的代码是完全错误的,请发表评论,或者发布一个更接近我所寻找的解决方案,以便我可以回顾一下线性回归的基本方面。
# Import libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from copy import deepcopy
trainingSet = np.array(pd.read_csv("training.csv"))
target = np.array(pd.read_csv('target.csv'))
print('----------------------------------------')
#Weights
weight_1 = 5
weight_2 = 5
weight_3 = 5
weight_4 = 5
weight_5 = 5
bias = 3
#Assigning inputs
weights = np.array([[weight_1],[weight_2],[weight_3],[weight_4],[weight_5]])
def predict(features, w, bias):
prediction = np.dot(features, weights)
prediction += bias
return(prediction)
def error(targets, predictions):
numTargets = len(targets)
error = np.array([])
for i in range(numTargets):
target = targets[i][0]
prediction = predictions[i][0]
error = np.append(error, [target-prediction])
error = np.mean(error)
return error
def update_weights(weights,targets,features,bias):
learn = 0.00000031
size = features[:,0]
beds = features[:,1]
baths = features[:,2]
offers = features[:,3]
loc = features[:,4]
prediction = predict(features,weights,bias)
d_w1 = -size*(targets-prediction)
d_w2 = -beds*(targets-prediction)
d_w3 = -baths*(targets-prediction)
d_w4 = -offers*(targets-prediction)
d_w5 = -loc*(targets-prediction)
weights[0][0] -= (learn*np.mean(d_w1))
weights[1][0] -= (learn*np.mean(d_w2))
weights[2][0] -= (learn*np.mean(d_w3))
weights[3][0] -= (learn*np.mean(d_w4))
weights[4][0] -= (learn*np.mean(d_w5))
return weights
def train(weights,targets,features,bias):
cost_history = []
for i in range(100001):
weights = update_weights(weights,targets,features,bias)
prediction = predict(features,weights,bias)
cost = error(targets,prediction)
cost_history.append(cost)
if i % 50000 == 0:
print('Iteration: {}'.format(i))
print('Prediction: {}'.format(prediction[0][0]))
print('Target: {}'.format(target[0][0]))
print('Cost: {}'.format(cost))
print('Weights: {}'.format(weights))
print('----------------------------------------')
return weights
weights = train(weights,target,trainingSet,bias)
print(predict(trainingSet,weights,bias))
我不确定通过线性回归从多个数据源得到的准确预测如何,但是经过训练,我的MSE费用通常约为600。谢谢:)
此外,如果需要更多代码,我可以提供数据集以及代码文件的其余部分(堆栈溢出专门表示没有我的整个文件,这就是为什么我只有我的代码来更新权重。)如果只是完全错误,请发表评论。
更新:我已经用实际能够正确计算成本的更新版本更新了代码文件。
我制作了一个多元线性回归的版本,如果它正是您所需要的,则为idk:https://github.com/GDalforno/Machine-Learning-From-Scratch/blob/master/Linear_Regression.py