Python 中的多元回归

问题描述 投票:0回答:3

我想在Python中基于多个相关数据数组和多个独立数据数组执行多元线性回归。

我见过很多多重线性回归,有多个独立输入,几乎每个人都认为多重=多元,但事实并非如此。我在互联网上看不到任何真正的多元教程。我想要的是多输出+多输入。

from pandas import DataFrame
from sklearn import linear_model
import tkinter as tk 
import statsmodels.api as sm

Stock_Market = {'Year': [2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018],
                'Agriculture': [1, 0.8965517282485962, 0.4350132942199707, 0.5384615659713745, 1.1071428582072258, 0.1071428582072258, 0.1290322244167328, -0.07096776366233826, -0.37857140600681305, -0.439440980553627, -0.2020460031926632, -0.16339869424700737, 2.277777746319771], 
                'Demand_risk':[1,0.015701416,0.638652235,0.744531459,0.630988038,0.787568771,1.796302615,1.708789548,1.897916832,1.643077606,1.579785002,2.444568612,2.626896547],
                'International_risk':[1,1.609574468,1.225836431,1.30566937,1.771415837,1.737162303,2.156292933,2.365513975,2.502820771,2.660719511,2.468833192,2.624733983,2.577283326],
                'Production_risk': [1,0.76346912,1.421097464,1.423616355,1.434009229,1.307186577,1.378837063,1.3577073,1.744395371,1.744281735,1.559044776,1.570226289,1.116485043],
                'Technology_risk': [1,1.029845201,1.042711964,1.053634438,1.038367263,0.659816279,0.90179752,1.448686704,1.836091216,1.644680334,1.413661748,1.089683923,1.191047799]        
                }


df = DataFrame(Stock_Market,columns=['Year','Agriculture','Demand_risk','International_risk','Production_risk', 'Technology_risk']) 

X = df[['Demand_risk','International_risk','Production_risk', 'Technology_risk']] # here we have 2 input variables for multiple regression. If you just want to use one variable for simple linear regression, then use X = df['Interest_Rate'] for example.Alternatively, you may add additional variables within the brackets
Y = df['Year', 'Agriculture'] # output variable (what we are trying to predict)

# with sklearn
regr = linear_model.LinearRegression()
regr.fit(X, Y)

print('Intercept: \n', regr.intercept_)
print('Coefficients: \n', regr.coef_)

# compute with statsmodels, by adding intercept manually
import statsmodels.api as sm
X1 = sm.add_constant(X)
result = sm.OLS(Y, X1).fit()
#print dir(result)
print (result.rsquared, result.rsquared_adj)

我想更改输出变量 Y,以便它可以处理多个数组,而不仅仅是单个数组(现在它会抛出错误)。

python regression multivariate-testing
3个回答
0
投票

您似乎正在寻找这样的实现:

https://www.statsmodels.org/stable/ generated/statsmodels.multivariate.multivariate_ols._MultivariateOLS.html#statsmodels.multivariate.multivariate_ols._MultivariateOLS

不幸的是,目前似乎不起作用。 预期用途似乎是:

import statsmodels.multivariate.multivariate_ols

model = statsmodels.multivariate.multivariate_ols._MultivariateOLS(y, X)

fit_model = model.fit()

results = statsmodels.multivariate.multivariate_ols._MultivariateOLSResults(fit_model)

results.summary_frame()

但这会导致 NotImplementedError。

    466     def summary(self):
--> 467         raise NotImplementedError

但是,我确实发现我可以(我认为是)通过此进行多元方差分析测试,尽管文档似乎表明这是一个多元回归:

from statsmodels.multivariate.multivariate_ols import _MultivariateOLS

model = _MultivariateOLS.from_formula('y1 + y2 ~ x1 + x2 + x3 + x4',data)

results = model.fit() #method = 'svd'

mv_test = results.mv_test()

mv_results = statsmodels.multivariate.multivariate_ols.MultivariateTestResults(mv_test.results
                                                                      , endog_names = y.columns
                                                                      , exog_names = X.columns
                                                                     )

mv_results.summary_frame

有关 mv_test

的信息

我花了一段时间才找到这个问题,因为大多数教程和文本都与“多元”和“多重”令人沮丧地混合在一起......我发现唯一能解决这个问题的实际包是 Sklearn 的 MLPRegressor和 SHAP 包。

https://shap.readthedocs.io/en/latest/example_notebooks/tabular_examples/model_agnostic/Multioutput%20Regression%20SHAP.html

https://machinelearningmastery.com/deep-learning-models-for-multi-output-regression/

但是如果您找到其他解决方案(不是基于神经网络的),请告诉我。谢谢,祝你好运


0
投票

不是真正的多变量方法,而是一种巧妙的解决方法

# analyze the data with statsmodels
X = df[['X1', 'X2', 'X3', 'X4']]
y = df[['y1', 'y2']]

ms = {}
for col in y.columns:
    X = sm.add_constant(X)
    model = sm.OLS(y[col], X).fit()
    ms[col] = model.summary()
#     mod = sm.OLS(y[col], X)
#     model = mod.fit()
#     ms[col] = model.summary()
# print(ms["y1"])
for k,v in ms.items():
    print(k, "\n", v, "\n")

0
投票

如果您从源代码构建 statsmodels,则从 v0.15.0dev 开始,statsmodels.multivariate.multivariate_ols 中有一个 MultivariateLS 类。拟合后,结果类中有一个摘要 func 和 params 属性。查看链接以获取示例。同样,如果您只是 pip install statsmodels,则不会发现从 v0.14.2 开始可用的 MultivariateLS,您需要通过从 github 克隆包来从源代码构建,然后 cd 到包目录,最后“pip install -e”。然后您就可以像下面的示例一样导入它。

https://www.statsmodels.org/dev/examples/notebooks/ generated/multivariate_ls.html

© www.soinside.com 2019 - 2024. All rights reserved.