如何在python中使用linearRegression

问题描述 投票:-1回答:1

我想在我的csv文件中提取两个功能之间的关系。我想使用linearRegression来确定与这些年相关的肥胖趋势。这是我的代码;

CODE

#Analysis of obesity by country

import pandas as pd
import seaborn as sb
import matplotlib.pyplot as plt
import numpy as np
import sklearn
from sklearn import metrics
from sklearn.linear_model import LinearRegression

address = 'C:/Users/Andre/Desktop/Python/firstMN/obesity-cleaned.csv'
dt = pd.read_csv(address)

#eliminate superfluos data
dt.drop(dt['Obesity (%)'][dt['Obesity (%)'].values == 'No data'].index, inplace=True)  

for i in range(len(dt)):
   dt['Obesity (%)'].values[i] = float(dt['Obesity (%)'].values[i].split()[0])  

obMean = dt['Obesity (%)'].mean() 
print('%0.3f' %obMean, '\n') 

dt['Obesity (%)'] = dt['Obesity (%)'].astype(float)  #converto il tipo in float 

group = dt.groupby('Country')


print(group[['Year', 'Obesity (%)']].mean(), '\n') 

dt1 = dt[dt['Sex'] == 'Both sexes']   

print(dt1[dt1['Obesity (%)'] == dt1['Obesity (%)'].max()], '\n')   

sb.lmplot('Year', 'Obesity (%)', dt1)
plt.show()

#linear regression predictions

group1 = dt1.groupby('Year')

x = np.array(np.linspace(1975, 2016, 2016-1975+1)).tolist() 
y = np.array([group1['Obesity (%)'].mean()]).tolist()[0]

x1 = np.array(np.linspace(1975, 2016, 2016-1975+1)).reshape(1, -1) 
y1 = np.array([group1['Obesity (%)'].mean()]) 

lr = LinearRegression(fit_intercept=False)
lr.fit(x1, y1) 

plt.plot(x, y) 
plt.show() 

print('Coefficients: ', lr.coef_)  
print("Intercept: ", lr.intercept_ )

y_hat = lr.predict(x1)
print('MSE: ', sklearn.metrics.mean_squared_error(y_hat, y1)) 
print('R^2: ', model.score(x1, y1) ) 
print('var: ', y1.var())

问题是我获得了多个系数,而我只获得了一个系数和一个截距,为什么这样做?

输出

Coefficients:  [[7.68857169e-05 7.69246464e-05 7.69635759e-05 ... 7.84039665e-05
  7.84428960e-05 7.84818255e-05]
 [7.95627446e-05 7.96030295e-05 7.96433144e-05 ... 8.11338570e-05
  8.11741419e-05 8.12144269e-05]
 [8.22150421e-05 8.22566700e-05 8.22982979e-05 ... 8.38385290e-05
  8.38801569e-05 8.39217848e-05]
 ...
 [2.24882685e-04 2.24996549e-04 2.25110414e-04 ... 2.29323406e-04
  2.29437271e-04 2.29551135e-04]
 [2.30366573e-04 2.30483214e-04 2.30599855e-04 ... 2.34915584e-04
  2.35032225e-04 2.35148866e-04]
 [2.35708263e-04 2.35827609e-04 2.35946955e-04 ... 2.40362755e-04
  2.40482101e-04 2.40601447e-04]]
Intercept:  0.0
MSE:  7.099748146989106e-30

您可以看到我的截距是0,我想是因为我选择fit_intercept = False,但是我的系数不止一个,为什么?

python machine-learning data-science linear-regression sklearn-pandas
1个回答
0
投票

它会产生多个系数,因为您要求这样做。您是否考虑过尝试构建自己的回归器以获得结果?我通过在Enlight上找到的教程构建了自己的教程:https://enlight.nyc/projects/linear-regression

© www.soinside.com 2019 - 2024. All rights reserved.