如何在线性回归中强制零截取？

Question

我有一些或多或少的线性数据：

x = [0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 2.0, 4.0, 6.0, 8.0, 10.0, 20.0, 40.0, 60.0, 80.0]
y = [0.50505332505407008, 1.1207373784533172, 2.1981844719020001, 3.1746209003398689, 4.2905482471260044, 6.2816226678076958, 11.073788414382639, 23.248479770546009, 32.120462301367183, 44.036117671229206, 54.009003143831116, 102.7077685684846, 185.72880217806673, 256.12183145545811, 301.97120103079675]

我正在使用

scipy.optimize.leastsq

来拟合线性回归：

def lin_fit(x, y):
    '''Fits a linear fit of the form mx+b to the data'''
    fitfunc = lambda params, x: params[0] * x + params[1]    #create fitting function of form mx+b
    errfunc = lambda p, x, y: fitfunc(p, x) - y              #create error function for least squares fit

    init_a = 0.5                            #find initial value for a (gradient)
    init_b = min(y)                         #find initial value for b (y axis intersection)
    init_p = numpy.array((init_a, init_b))  #bundle initial values in initial parameters

    #calculate best fitting parameters (i.e. m and b) using the error function
    p1, success = scipy.optimize.leastsq(errfunc, init_p.copy(), args = (x, y))
    f = fitfunc(p1, x)          #create a fit with those parameters
    return p1, f

而且它工作得很漂亮（虽然我不确定

scipy.optimize

是否适合在这里使用，它可能有点过分了？）。

但是，由于数据点所在的方式，它不会给我 0 处的 y 轴截距。我确实知道在这种情况下它必须为零，

if x = 0 than y = 0

。

有什么办法可以强制这样做吗？

Answer 1

正如@AbhranilDas提到的，只需使用线性方法。不需要像

scipy.optimize.lstsq

这样的非线性求解器。

通常，您会使用

numpy.polyfit

来拟合数据线，但在这种情况下，您需要直接使用

numpy.linalg.lstsq

，因为您希望将截距设置为零。

举个简单的例子：

import numpy as np
import matplotlib.pyplot as plt

x = np.array([0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 2.0, 4.0, 6.0, 8.0, 10.0, 
              20.0, 40.0, 60.0, 80.0])

y = np.array([0.50505332505407008, 1.1207373784533172, 2.1981844719020001,
              3.1746209003398689, 4.2905482471260044, 6.2816226678076958,
              11.073788414382639, 23.248479770546009, 32.120462301367183, 
              44.036117671229206, 54.009003143831116, 102.7077685684846, 
              185.72880217806673, 256.12183145545811, 301.97120103079675])

# Our model is y = a * x, so things are quite simple, in this case...
# x needs to be a column vector instead of a 1D vector for this, however.
x = x[:,np.newaxis]
a, _, _, _ = np.linalg.lstsq(x, y)

plt.plot(x, y, 'bo')
plt.plot(x, a*x, 'r-')
plt.show()

enter image description here

Answer 2

我不擅长这些模块，但我在统计方面有一些经验，所以这就是我所看到的。您需要从

更改您的拟合函数

fitfunc = lambda params, x: params[0] * x + params[1]

至：

fitfunc = lambda params, x: params[0] * x

同时删除该行：

init_b = min(y)

并将下一行更改为：

init_p = numpy.array((init_a))

这应该去掉产生 y 截距的第二个参数，并使拟合线穿过原点。为此，您可能还需要在其余代码中进行一些细微的更改。

但是，是的，我不确定如果你像这样去掉第二个参数，这个模块是否会工作。是否接受这种修改取决于模块的内部工作原理。例如，我不知道参数列表

params

在哪里初始化，所以我不知道这样做是否会改变它的长度。

顺便说一句，既然你提到了，我实际上认为这是一种优化坡度的过度方法。您可以稍微阅读一下线性回归，并在进行一些粗略的微积分之后编写一些小代码来自己完成。这确实非常简单明了。事实上，我只是做了一些计算，我猜优化的斜率就是

<xy>/<x^2>

，即 x*y 乘积的平均值除以 x^2 的平均值。

Answer 3

从

Python 3.11

开始，我们可以直接使用标准库执行将截距强制为0的

linear_regression

：

from statistics import linear_regression

# x = [0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 2.0, 4.0, 6.0, 8.0, 10.0, 20.0, 40.0, 60.0, 80.0]
# y = [0.50505332505407008, 1.1207373784533172, 2.1981844719020001, 3.1746209003398689, 4.2905482471260044, 6.2816226678076958, 11.073788414382639, 23.248479770546009, 32.120462301367183, 44.036117671229206, 54.009003143831116, 102.7077685684846, 185.72880217806673, 256.12183145545811, 301.97120103079675]
slope, intercept = linear_regression(x, y, proportional=True)
# (4.1090219715758085, 0.0)

参数

proportional

设置为

True

，以指定假设

和

成正比（并且数据要拟合到穿过原点的直线）。

Answer 4

如果您使用

scikit-learn

，

linear_model.LinearRegression()

和

linear_model.Ridge()

都有一个

fit_intercept

参数。当参数设置为

时，截距强制为

False

。

如何在线性回归中强制零截取？

问题描述投票：0回答：4

4个回答

最新问题

如何在线性回归中强制零截取？

问题描述 投票：0回答：4

4个回答

最新问题

问题描述投票：0回答：4