运行 MixedLM 并希望将输出推送到 Excel 或 CSV,请参阅下面的模型代码和输出:
model = smf.mixedlm('y_var ~ gas_prices', dfModel,
groups = dfModel['region'])
mdf = model.fit()
print(mdf.summary())
Mixed Linear Model Regression Results
======================================================================
Model: MixedLM Dependent Variable: yVar
No. Observations: 420 Method: REML
No. Groups: 4 Scale: 45635645671.2271
Min. group size: 105 Likelihood: -5720.8133
Max. group size: 105 Converged: Yes
Mean group size: 105.0
----------------------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
----------------------------------------------------------------------
Intercept 3241461.947 112718.823 28.757 0.000 3020537.112 3462386.781
gas_prices -118128.471 46931.809 -2.517 0.012 -210113.126 -26143.816
xVar2 275.017 165.072 1.666 0.096 -48.518 598.553
groups RE 0.002
======================================================================
我尝试过推送
mdf.summary().to_excel
但这不起作用,除了使用 mdf.summary()
创建 Pandas DataFrame 然后推送到 Excel 之外,这也不起作用。
这里的额外功劳是为 Excel 中的每个输出创建一个唯一的文件名,这样如果我运行几个模型,它们就不会互相覆盖。
如何将其导入 Excel?
statsmodels 有两个用于构建汇总表的底层函数。有些模型使用其中一种,有些模型在结果实例中同时具有
summary()
和 summary2()
方法。
MixedLM 使用
summary2
作为 summary
,将基础表构建为 pandas DataFrame。
我现在没有可用的混合效应模型,所以这是针对 GLM 模型结果实例 res1
>>> summ2 = res1.summary2()
>>> len(summ2.tables)
2
>>> type(summ2.tables[1])
pandas.core.frame.DataFrame
>>> type(summ2.tables[0])
pandas.core.frame.DataFrame
这两个表可以与 pandas 一起使用,如已删除的答案中所示,以创建 Excel 文件。
summary
实现在 MixedLM 中不可用,但它是大多数其他模型的默认摘要,具有 as_csv
方法,但它使用与字符串版本中相同的精度。 summary
版本目前不构建底层DataFrame。
>>> summ = res1.summary()
>>> print(summ.as_csv())
Generalized Linear Model Regression Results
Dep. Variable: ,['y1', 'y2'] , No. Observations: , 303
Model: ,GLM , Df Residuals: , 282
Model Family: ,Binomial , Df Model: , 20
Link Function: ,logit , Scale: , 1.0000
Method: ,IRLS , Log-Likelihood: , -2998.6
Date: ,Sat, 19 May 2018, Deviance: , 4078.8
Time: ,08:42:45 , Pearson chi2: ,4.05e+03
No. Iterations:,5 , Covariance Type: ,nonrobust
, coef , std err , z ,P>|z| , [0.025 , 0.975]
x1 , -0.0168, 0.000, -38.749, 0.000, -0.018, -0.016
x2 , 0.0099, 0.001, 16.505, 0.000, 0.009, 0.011
x3 , -0.0187, 0.001, -25.182, 0.000, -0.020, -0.017
x4 , -0.0142, 0.000, -32.818, 0.000, -0.015, -0.013
x5 , 0.2545, 0.030, 8.498, 0.000, 0.196, 0.313
x6 , 0.2407, 0.057, 4.212, 0.000, 0.129, 0.353
x7 , 0.0804, 0.014, 5.775, 0.000, 0.053, 0.108
x8 , -1.9522, 0.317, -6.162, 0.000, -2.573, -1.331
x9 , -0.3341, 0.061, -5.453, 0.000, -0.454, -0.214
x10 , -0.1690, 0.033, -5.169, 0.000, -0.233, -0.105
x11 , 0.0049, 0.001, 3.921, 0.000, 0.002, 0.007
x12 , -0.0036, 0.000, -15.878, 0.000, -0.004, -0.003
x13 , -0.0141, 0.002, -7.391, 0.000, -0.018, -0.010
x14 , -0.0040, 0.000, -8.450, 0.000, -0.005, -0.003
x15 , -0.0039, 0.001, -4.059, 0.000, -0.006, -0.002
x16 , 0.0917, 0.015, 6.321, 0.000, 0.063, 0.120
x17 , 0.0490, 0.007, 6.574, 0.000, 0.034, 0.064
x18 , 0.0080, 0.001, 5.362, 0.000, 0.005, 0.011
x19 , 0.0002, 2.99e-05, 7.428, 0.000, 0.000, 0.000
x20 , -0.0022, 0.000, -6.445, 0.000, -0.003, -0.002
const, 1.9589, 1.547, 1.266, 0.205, -1.073, 4.990
(欢迎在 statsmodels 摘要中提出附加选项的拉取请求。)
正如@Josef提到的
MixedLMResults.summary 是一个内部的summary2,尝试 mdf.summary().tables
虽然 Dtype 可能是“对象”并且不是数字,如以下示例所示:
model = smf.mixedlm(
"log_tv ~ C(group, Treatment(reference='Control'))*day",
data=tv_data,
groups=tv_data['animal'],
re_formula='~day',
)
est= model.fit()
est.summary().tables[1].info()
<class 'pandas.core.frame.DataFrame'>
Index: 11 entries, Intercept to day Var
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Coef. 11 non-null object
1 Std.Err. 11 non-null object
2 z 11 non-null object
3 P>|z| 11 non-null object
4 [0.025 11 non-null object
5 0.975] 11 non-null object
dtypes: object(6)
memory usage: 616.0+ bytes
为什么?因为“tables[1]”的结构包含一些空单元格:
| | Coef. | Std.Err. | z | P>|z| | [0.025 | 0.975] |
|:-----------------------------------------------------------------|--------:|-----------:|:-------|:--------|:---------|:---------|
| Intercept | 4.301 | 0.072 | 60.137 | 0.000 | 4.161 | 4.441 |
| C(group, Treatment(reference='Control'))[T.Therapy_A] | -0.003 | 0.101 | -0.025 | 0.980 | -0.201 | 0.196 |
| C(group, Treatment(reference='Control'))[T.Therapy_B] | 0.109 | 0.101 | 1.081 | 0.280 | -0.089 | 0.308 |
| C(group, Treatment(reference='Control'))[T.Therapy_Combo_AB] | 0.116 | 0.101 | 1.147 | 0.251 | -0.082 | 0.314 |
| day | 0.181 | 0.01 | 18.938 | 0.000 | 0.162 | 0.200 |
| C(group, Treatment(reference='Control'))[T.Therapy_A]:day | -0.046 | 0.014 | -3.420 | 0.001 | -0.073 | -0.020 |
| C(group, Treatment(reference='Control'))[T.Therapy_B]:day | -0.071 | 0.014 | -5.283 | 0.000 | -0.098 | -0.045 |
| C(group, Treatment(reference='Control'))[T.Therapy_Combo_AB]:day | -0.132 | 0.014 | -9.800 | 0.000 | -0.159 | -0.106 |
| Group Var | 0.039 | 0.055 | | | | |
| Group x day Cov | 0.003 | 0.005 | | | | |
| day Var | 0.001 | 0.001 | | | | |
一种解决方案是将 Object 类型的列转换为浮动:
df=est.summary().tables[1]
for col in df.columns:
df[col] = pd.to_numeric(df[col], errors='coerce')
df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 11 entries, Intercept to day Var
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Coef. 11 non-null float64
1 Std.Err. 11 non-null float64
2 z 8 non-null float64
3 P>|z| 8 non-null float64
4 [0.025 8 non-null float64
5 0.975] 8 non-null float64
dtypes: float64(6)
memory usage: 616.0+ bytes