numpy向量化的回归方法-单个独立列(y)上的多个从属列(x)

问题描述 投票:1回答:1

请考虑以下(3,13)np.array

from scipy.stats import linregress

a = [-0.00845,-0.00568,-0.01286,-0.01302,-0.02212,-0.01501,-0.02132,-0.00783,-0.00942,0.00158,-0.00016,0.01422,0.01241]
b = [0.00115,0.00623,0.00160,0.00660,0.00951,0.01258,0.00787,0.01854,0.01462,0.01479,0.00980,0.00607,-0.00106]
c = [-0.00233,-0.00467,0.00000,0.00000,-0.00952,-0.00949,-0.00958,-0.01696,-0.02212,-0.01006,-0.00270,0.00763,0.01005]
array = np.array([a,b,c])
yvalues = pd.to_datetime(['2019-12-15','2019-12-16','2019-12-17','2019-12-18','2019-12-19','2019-12-22','2019-12-23','2019-12-24',\
                    '2019-12-25','2019-12-26','2019-12-29','2019-12-30','2019-12-31'], errors='coerce')

我可以一次成功地在一列上运行OLS回归,如下所示:

out = linregress(array[0], y=yvalues.to_julian_date())
print(out)
LinregressResult(slope=329.141087037396, intercept=2458842.411731361, rvalue=0.684426534581417, pvalue=0.009863937200252878, stderr=105.71465449878443)

但是,我希望完成的工作是:对矩阵array进行回归分析,其中'y'变量(yvalues)对于所有列都是恒定的-一次性执行(循环是可能的解决方案,但很麻烦)。我尝试扩展“ yvalues”以使array形状与(np.tile)匹配。但似乎不是正确的方法。谢谢大家的帮助。

python numpy scipy linear-regression
1个回答
0
投票

IIUC,您正在以向量化的方式寻找类似于以下列表理解的内容:

out = [linregress(array[i], y=yvalues.to_julian_date()) for i in range(array.shape[0])]

out
[LinregressResult(slope=329.141087037396, intercept=2458842.411731361, rvalue=0.684426534581417, pvalue=0.009863937200252876, stderr=105.71465449878443),
 LinregressResult(slope=178.44888292241782, intercept=2458838.7056912296, rvalue=0.1911788042719021, pvalue=0.5315353013148307, stderr=276.24376878908953),
 LinregressResult(slope=106.86168938856262, intercept=2458840.7656617565, rvalue=0.17721031419860186, pvalue=0.5624701260912525, stderr=178.940293876864)]

老实说,我从未见过使用scipystatsmodels功能实现的功能。

因此,我们可以利用numpy broadcasting自己实现它:

x = array
y = np.array(yvalues.to_julian_date())

# mean of our inputs and outputs
x_mean = np.mean(x, axis=1)
y_mean = np.mean(y)

#total number of values
n = x.shape[1]

# using the formula to calculate the slope and intercept

n = np.sum((x - x_mean[:,np.newaxis]) * (y - y_mean)[np.newaxis,:], axis=1)
d = np.sum((x - x_mean[:,np.newaxis])**2, axis=1)

slopes = n/d
intercepts = y_mean - slopes*x_mean

slopes
array([329.14108704, 178.44888292, 106.86168939])

intercepts
array([2458842.41173136, 2458838.70569123, 2458840.76566176])
© www.soinside.com 2019 - 2024. All rights reserved.