使用 matplotlib / numpy 进行线性回归

问题描述 投票:0回答:9

我正在尝试在生成的散点图上生成线性回归,但是我的数据采用列表格式,并且我能找到的使用

polyfit
的所有示例都需要使用
arange
。但
arange
不接受列表。我已经搜索了很多有关如何将列表转换为数组的信息,但似乎没有什么明确的。我是不是错过了什么?

接下来,我如何最好地使用我的整数列表作为

polyfit
的输入?

这是我正在关注的 Polyfit 示例:

import numpy as np
import matplotlib.pyplot as plt

x = np.arange(data)
y = np.arange(data)

m, b = np.polyfit(x, y, 1)

plt.plot(x, y, 'yo', x, m*x+b, '--k')
plt.show()
python numpy matplotlib linear-regression curve-fitting
9个回答
232
投票

arange
生成列表(嗯,numpy 数组);输入
help(np.arange)
了解详细信息。您不需要在现有列表上调用它。

>>> x = [1,2,3,4]
>>> y = [3,5,7,9] 
>>> 
>>> m,b = np.polyfit(x, y, 1)
>>> m
2.0000000000000009
>>> b
0.99999999999999833

我应该补充一点,我倾向于在这里使用

poly1d
而不是写出“m*x+b”和高阶等价物,所以我的代码版本看起来像这样:

import numpy as np
import matplotlib.pyplot as plt

x = [1,2,3,4]
y = [3,5,7,10] # 10, not 9, so the fit isn't perfect

coef = np.polyfit(x,y,1)
poly1d_fn = np.poly1d(coef) 
# poly1d_fn is now a function which takes in x and returns an estimate for y

plt.plot(x,y, 'yo', x, poly1d_fn(x), '--k') #'--k'=black dashed line, 'yo' = yellow circle marker

plt.xlim(0, 5)
plt.ylim(0, 12)

enter image description here


44
投票

此代码:

from scipy.stats import linregress

linregress(x,y) #x and y are arrays or lists.

给出包含以下内容的列表:

坡度:浮动
回归线的斜率
拦截:浮动
回归线的截距
r 值:float
相关系数
p 值:浮动
假设检验的两侧 p 值,其零假设是斜率为零
标准错误:浮动
估计的标准误差

来源


20
投票

使用

statsmodels.api.OLS
获取拟合/系数/残差的详细细分:

import statsmodels.api as sm

df = sm.datasets.get_rdataset('Duncan', 'carData').data
y = df['income']
x = df['education']

model = sm.OLS(y, sm.add_constant(x))
results = model.fit()

print(results.params)
# const        10.603498 <- intercept
# education     0.594859 <- slope
# dtype: float64

print(results.summary())
#                             OLS Regression Results                            
# ==============================================================================
# Dep. Variable:                 income   R-squared:                       0.525
# Model:                            OLS   Adj. R-squared:                  0.514
# Method:                 Least Squares   F-statistic:                     47.51
# Date:                Thu, 28 Apr 2022   Prob (F-statistic):           1.84e-08
# Time:                        00:02:43   Log-Likelihood:                -190.42
# No. Observations:                  45   AIC:                             384.8
# Df Residuals:                      43   BIC:                             388.5
# Df Model:                           1                                         
# Covariance Type:            nonrobust                                         
# ==============================================================================
#                  coef    std err          t      P>|t|      [0.025      0.975]
# ------------------------------------------------------------------------------
# const         10.6035      5.198      2.040      0.048       0.120      21.087
# education      0.5949      0.086      6.893      0.000       0.421       0.769
# ==============================================================================
# Omnibus:                        9.841   Durbin-Watson:                   1.736
# Prob(Omnibus):                  0.007   Jarque-Bera (JB):               10.609
# Skew:                           0.776   Prob(JB):                      0.00497
# Kurtosis:                       4.802   Cond. No.                         123.
# ==============================================================================

matplotlib 3.5.0 中的新增功能

要绘制最佳拟合线,只需将斜率

m
和截距
b
传入新的
plt.axline

import matplotlib.pyplot as plt

# extract intercept b and slope m
b, m = results.params

# plot y = m*x + b
plt.axline(xy1=(0, b), slope=m, label=f'$y = {m:.1f}x {b:+.1f}$')

请注意,斜率

m
和截距
b
可以从任何常见的回归方法中轻松提取:

  • numpy.polyfit

    import numpy as np
    
    m, b = np.polyfit(x, y, deg=1)
    plt.axline(xy1=(0, b), slope=m, label=f'$y = {m:.1f}x {b:+.1f}$')
    
  • scipy.stats.linregress

    from scipy import stats
    
    m, b, *_ = stats.linregress(x, y)
    plt.axline(xy1=(0, b), slope=m, label=f'$y = {m:.1f}x {b:+.1f}$')
    
  • statsmodels.api.OLS

    import statsmodels.api as sm
    
    b, m = sm.OLS(y, sm.add_constant(x)).fit().params
    plt.axline(xy1=(0, b), slope=m, label=f'$y = {m:.1f}x {b:+.1f}$')
    
  • sklearn.linear_model.LinearRegression

    from sklearn.linear_model import LinearRegression
    
    reg = LinearRegression().fit(x[:, None], y)
    b = reg.intercept_
    m = reg.coef_[0]
    plt.axline(xy1=(0, b), slope=m, label=f'$y = {m:.1f}x {b:+.1f}$')
    

8
投票
import numpy as np
import matplotlib.pyplot as plt 
from scipy import stats

x = np.array([1.5,2,2.5,3,3.5,4,4.5,5,5.5,6])
y = np.array([10.35,12.3,13,14.0,16,17,18.2,20,20.7,22.5])
gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y)
mn=np.min(x)
mx=np.max(x)
x1=np.linspace(mn,mx,500)
y1=gradient*x1+intercept
plt.plot(x,y,'ob')
plt.plot(x1,y1,'-r')
plt.show()

用这个..


6
投票

George 的答案 与 matplotlib 的 axline 很好地结合在一起,它绘制了一条无限的线。

from scipy.stats import linregress
import matplotlib.pyplot as plt

reg = linregress(x, y)
plt.axline(xy1=(0, reg.intercept), slope=reg.slope, linestyle="--", color="k")

3
投票
from pylab import * 

import numpy as np
x1 = arange(data) #for example this is a list
y1 = arange(data) #for example this is a list 
x=np.array(x) #this will convert a list in to an array
y=np.array(y)
m,b = polyfit(x, y, 1) 

plot(x, y, 'yo', x, m*x+b, '--k') 
show()

2
投票

另一个快速而肮脏的答案是,您可以使用以下方法将列表转换为数组:

import numpy as np
arr = np.asarray(listname)

0
投票

我建议您尝试在 premath 库上进行操作。你可以在PyPI中搜索一下。


-2
投票

线性回归是人工智能入门的一个很好的例子

这是使用 Python 进行多重线性回归机器学习算法的一个很好的示例:

##### Predicting House Prices Using Multiple Linear Regression - @Y_T_Akademi
    
#### In this project we are gonna see how machine learning algorithms help us predict house prices. Linear Regression is a model of predicting new future data by using the existing correlation between the old data. Here, machine learning helps us identify this relationship between feature data and output, so we can predict future values.

import pandas as pd

##### we use sklearn library in many machine learning calculations..

from sklearn import linear_model

##### we import out dataset: housepricesdataset.csv

df = pd.read_csv("housepricesdataset.csv",sep = ";")

##### The following is our feature set:
##### The following is the output(result) data:
##### we define a linear regression model here: 

reg = linear_model.LinearRegression()
reg.fit(df[['area', 'roomcount', 'buildingage']], df['price'])

# Since our model is ready, we can make predictions now:
# lets predict a house with 230 square meters, 4 rooms and 10 years old building..

reg.predict([[230,4,10]])

# Now lets predict a house with 230 square meters, 6 rooms and 0 years old building - its new building..
reg.predict([[230,6,0]])

# Now lets predict a house with 355 square meters, 3 rooms and 20 years old building 
reg.predict([[355,3,20]])

# You can make as many prediction as you want.. 
reg.predict([[230,4,10], [230,6,0], [355,3,20], [275, 5, 17]])

我的数据集如下:

enter image description here

© www.soinside.com 2019 - 2024. All rights reserved.