我想要一个系数和与之相关的 Newey-West 标准误差。
我正在寻找可以执行以下 R 代码正在执行的操作的 Python 库(理想情况下,但任何可行的解决方案都可以):
library(sandwich)
library(lmtest)
a <- matrix(c(1,3,5,7,4,5,6,4,7,8,9))
b <- matrix(c(3,5,6,2,4,6,7,8,7,8,9))
temp.lm = lm(a ~ b)
temp.summ <- summary(temp.lm)
temp.summ$coefficients <- unclass(coeftest(temp.lm, vcov. = NeweyWest))
print (temp.summ$coefficients)
结果:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.0576208 2.5230532 0.8155281 0.4358205
b 0.5594796 0.4071834 1.3740235 0.2026817
我得到了系数以及与之相关的标准误差。
我看到 statsmodels.stats.sandwich_covariance.cov_hac 模块,但我不知道如何使其与 OLS 一起使用。
已编辑(2015 年 10 月 31 日)以反映 2015 年秋季
statsmodels
的首选编码风格。
在statsmodels
版本0.6.1中,您可以执行以下操作:
import pandas as pd
import numpy as np
import statsmodels.formula.api as smf
df = pd.DataFrame({'a':[1,3,5,7,4,5,6,4,7,8,9],
'b':[3,5,6,2,4,6,7,8,7,8,9]})
reg = smf.ols('a ~ 1 + b',data=df).fit(cov_type='HAC',cov_kwds={'maxlags':1})
print(reg.summary())
OLS Regression Results
==============================================================================
Dep. Variable: a R-squared: 0.281
Model: OLS Adj. R-squared: 0.201
Method: Least Squares F-statistic: 1.949
Date: Sat, 31 Oct 2015 Prob (F-statistic): 0.196
Time: 03:15:46 Log-Likelihood: -22.603
No. Observations: 11 AIC: 49.21
Df Residuals: 9 BIC: 50.00
Df Model: 1
Covariance Type: HAC
==============================================================================
coef std err z P>|z| [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept 2.0576 2.661 0.773 0.439 -3.157 7.272
b 0.5595 0.401 1.396 0.163 -0.226 1.345
==============================================================================
Omnibus: 0.361 Durbin-Watson: 1.468
Prob(Omnibus): 0.835 Jarque-Bera (JB): 0.331
Skew: 0.321 Prob(JB): 0.847
Kurtosis: 2.442 Cond. No. 19.1
==============================================================================
Warnings:
[1] Standard Errors are heteroscedasticity and autocorrelation robust (HAC) using 1 lags and without small sample correction
或者可以在拟合模型后使用get_robustcov_results
方法:
reg = smf.ols('a ~ 1 + b',data=df).fit()
new = reg.get_robustcov_results(cov_type='HAC',maxlags=1)
print(new.summary())
OLS Regression Results
==============================================================================
Dep. Variable: a R-squared: 0.281
Model: OLS Adj. R-squared: 0.201
Method: Least Squares F-statistic: 1.949
Date: Sat, 31 Oct 2015 Prob (F-statistic): 0.196
Time: 03:15:46 Log-Likelihood: -22.603
No. Observations: 11 AIC: 49.21
Df Residuals: 9 BIC: 50.00
Df Model: 1
Covariance Type: HAC
==============================================================================
coef std err z P>|z| [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept 2.0576 2.661 0.773 0.439 -3.157 7.272
b 0.5595 0.401 1.396 0.163 -0.226 1.345
==============================================================================
Omnibus: 0.361 Durbin-Watson: 1.468
Prob(Omnibus): 0.835 Jarque-Bera (JB): 0.331
Skew: 0.321 Prob(JB): 0.847
Kurtosis: 2.442 Cond. No. 19.1
==============================================================================
Warnings:
[1] Standard Errors are heteroscedasticity and autocorrelation robust (HAC) using 1 lags and without small sample correction
statsmodels
的默认值与
R
中等效方法的默认值略有不同。通过将
R
调用更改为以下内容,可以使
statsmodels
方法等同于
vcov,
默认值(我上面所做的):
temp.summ$coefficients <- unclass(coeftest(temp.lm,
vcov. = NeweyWest(temp.lm,lag=1,prewhite=FALSE)))
print(temp.summ$coefficients)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.0576208 2.6605060 0.7733945 0.4591196
b 0.5594796 0.4007965 1.3959193 0.1962142
您仍然可以在 pandas 中执行 Newey-West (0.17),尽管我相信计划是在 pandas 中弃用 OLS:
print(pd.stats.ols.OLS(df.a,df.b,nw_lags=1))
-------------------------Summary of Regression Analysis-------------------------
Formula: Y ~ <x> + <intercept>
Number of Observations: 11
Number of Degrees of Freedom: 2
R-squared: 0.2807
Adj R-squared: 0.2007
Rmse: 2.0880
F-stat (1, 9): 1.5943, p-value: 0.2384
Degrees of Freedom: model 1, resid 9
-----------------------Summary of Estimated Coefficients------------------------
Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5%
--------------------------------------------------------------------------------
x 0.5595 0.4431 1.26 0.2384 -0.3090 1.4280
intercept 2.0576 2.9413 0.70 0.5019 -3.7073 7.8226
*** The calculations are Newey-West adjusted with lags 1
---------------------------------End of Summary---------------------------------