Python 中的模型输出`to_excel`?

问题描述 投票:0回答:2

运行 MixedLM 并希望将输出推送到 Excel 或 CSV,请参阅下面的模型代码和输出:

model = smf.mixedlm('y_var ~ gas_prices', dfModel, 
                 groups = dfModel['region'])
mdf = model.fit()
print(mdf.summary())

                Mixed Linear Model Regression Results
======================================================================
Model:                MixedLM   Dependent Variable:   yVar 
No. Observations:     420       Method:               REML            
No. Groups:           4         Scale:                45635645671.2271
Min. group size:      105       Likelihood:           -5720.8133      
Max. group size:      105       Converged:            Yes             
Mean group size:      105.0                                           
----------------------------------------------------------------------
              Coef.     Std.Err.    z    P>|z|    [0.025      0.975]  
----------------------------------------------------------------------
Intercept  3241461.947 112718.823 28.757 0.000 3020537.112 3462386.781
gas_prices -118128.471  46931.809 -2.517 0.012 -210113.126  -26143.816
xVar2          275.017    165.072  1.666 0.096     -48.518     598.553
groups RE        0.002                                                
======================================================================

我尝试过推送

mdf.summary().to_excel
但这不起作用,除了使用
mdf.summary()
创建 Pandas DataFrame 然后推送到 Excel 之外,这也不起作用。

这里的额外功劳是为 Excel 中的每个输出创建一个唯一的文件名,这样如果我运行几个模型,它们就不会互相覆盖。

如何将其导入 Excel?

python python-3.x pandas statsmodels
2个回答
1
投票

statsmodels 有两个用于构建汇总表的底层函数。有些模型使用其中一种,有些模型在结果实例中同时具有

summary()
summary2()
方法。

MixedLM 使用

summary2
作为
summary
,将基础表构建为 pandas DataFrame。

我现在没有可用的混合效应模型,所以这是针对 GLM 模型结果实例 res1

>>> summ2 = res1.summary2()
>>> len(summ2.tables)
2

>>> type(summ2.tables[1])
pandas.core.frame.DataFrame

>>> type(summ2.tables[0])
pandas.core.frame.DataFrame

这两个表可以与 pandas 一起使用,如已删除的答案中所示,以创建 Excel 文件。

summary
实现在 MixedLM 中不可用,但它是大多数其他模型的默认摘要,具有
as_csv
方法,但它使用与字符串版本中相同的精度。
summary
版本目前不构建底层DataFrame。

>>> summ = res1.summary()
>>> print(summ.as_csv())
          Generalized Linear Model Regression Results           
Dep. Variable: ,['y1', 'y2']    ,  No. Observations:  ,   303   
Model:         ,GLM             ,  Df Residuals:      ,   282   
Model Family:  ,Binomial        ,  Df Model:          ,    20   
Link Function: ,logit           ,  Scale:             ,  1.0000 
Method:        ,IRLS            ,  Log-Likelihood:    , -2998.6 
Date:          ,Sat, 19 May 2018,  Deviance:          ,  4078.8 
Time:          ,08:42:45        ,  Pearson chi2:      ,4.05e+03 
No. Iterations:,5               ,  Covariance Type:   ,nonrobust
     ,   coef   , std err ,    z    ,P>|z| ,  [0.025 ,  0.975] 
x1   ,   -0.0168,    0.000,  -38.749, 0.000,   -0.018,   -0.016
x2   ,    0.0099,    0.001,   16.505, 0.000,    0.009,    0.011
x3   ,   -0.0187,    0.001,  -25.182, 0.000,   -0.020,   -0.017
x4   ,   -0.0142,    0.000,  -32.818, 0.000,   -0.015,   -0.013
x5   ,    0.2545,    0.030,    8.498, 0.000,    0.196,    0.313
x6   ,    0.2407,    0.057,    4.212, 0.000,    0.129,    0.353
x7   ,    0.0804,    0.014,    5.775, 0.000,    0.053,    0.108
x8   ,   -1.9522,    0.317,   -6.162, 0.000,   -2.573,   -1.331
x9   ,   -0.3341,    0.061,   -5.453, 0.000,   -0.454,   -0.214
x10  ,   -0.1690,    0.033,   -5.169, 0.000,   -0.233,   -0.105
x11  ,    0.0049,    0.001,    3.921, 0.000,    0.002,    0.007
x12  ,   -0.0036,    0.000,  -15.878, 0.000,   -0.004,   -0.003
x13  ,   -0.0141,    0.002,   -7.391, 0.000,   -0.018,   -0.010
x14  ,   -0.0040,    0.000,   -8.450, 0.000,   -0.005,   -0.003
x15  ,   -0.0039,    0.001,   -4.059, 0.000,   -0.006,   -0.002
x16  ,    0.0917,    0.015,    6.321, 0.000,    0.063,    0.120
x17  ,    0.0490,    0.007,    6.574, 0.000,    0.034,    0.064
x18  ,    0.0080,    0.001,    5.362, 0.000,    0.005,    0.011
x19  ,    0.0002, 2.99e-05,    7.428, 0.000,    0.000,    0.000
x20  ,   -0.0022,    0.000,   -6.445, 0.000,   -0.003,   -0.002
const,    1.9589,    1.547,    1.266, 0.205,   -1.073,    4.990

(欢迎在 statsmodels 摘要中提出附加选项的拉取请求。)


0
投票

正如@Josef提到的

MixedLMResults.summary 是一个内部的summary2,尝试 mdf.summary().tables

虽然 Dtype 可能是“对象”并且不是数字,如以下示例所示:

model = smf.mixedlm(
    "log_tv ~ C(group, Treatment(reference='Control'))*day",
    data=tv_data,
    groups=tv_data['animal'],
    re_formula='~day',
)
est= model.fit()

est.summary().tables[1].info()
<class 'pandas.core.frame.DataFrame'>
Index: 11 entries, Intercept to day Var
Data columns (total 6 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   Coef.     11 non-null     object
 1   Std.Err.  11 non-null     object
 2   z         11 non-null     object
 3   P>|z|     11 non-null     object
 4   [0.025    11 non-null     object
 5   0.975]    11 non-null     object
dtypes: object(6)
memory usage: 616.0+ bytes

为什么?因为“tables[1]”的结构包含一些空单元格:

|                                                                  |   Coef. |   Std.Err. | z      | P>|z|   | [0.025   | 0.975]   |
|:-----------------------------------------------------------------|--------:|-----------:|:-------|:--------|:---------|:---------|
| Intercept                                                        |   4.301 |      0.072 | 60.137 | 0.000   | 4.161    | 4.441    |
| C(group, Treatment(reference='Control'))[T.Therapy_A]            |  -0.003 |      0.101 | -0.025 | 0.980   | -0.201   | 0.196    |
| C(group, Treatment(reference='Control'))[T.Therapy_B]            |   0.109 |      0.101 | 1.081  | 0.280   | -0.089   | 0.308    |
| C(group, Treatment(reference='Control'))[T.Therapy_Combo_AB]     |   0.116 |      0.101 | 1.147  | 0.251   | -0.082   | 0.314    |
| day                                                              |   0.181 |      0.01  | 18.938 | 0.000   | 0.162    | 0.200    |
| C(group, Treatment(reference='Control'))[T.Therapy_A]:day        |  -0.046 |      0.014 | -3.420 | 0.001   | -0.073   | -0.020   |
| C(group, Treatment(reference='Control'))[T.Therapy_B]:day        |  -0.071 |      0.014 | -5.283 | 0.000   | -0.098   | -0.045   |
| C(group, Treatment(reference='Control'))[T.Therapy_Combo_AB]:day |  -0.132 |      0.014 | -9.800 | 0.000   | -0.159   | -0.106   |
| Group Var                                                        |   0.039 |      0.055 |        |         |          |          |
| Group x day Cov                                                  |   0.003 |      0.005 |        |         |          |          |
| day Var                                                          |   0.001 |      0.001 |        |         |          |          |

一种解决方案是将 Object 类型的列转换为浮动:

df=est.summary().tables[1]
for col in df.columns:
    df[col] = pd.to_numeric(df[col], errors='coerce')
df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 11 entries, Intercept to day Var
Data columns (total 6 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Coef.     11 non-null     float64
 1   Std.Err.  11 non-null     float64
 2   z         8 non-null      float64
 3   P>|z|     8 non-null      float64
 4   [0.025    8 non-null      float64
 5   0.975]    8 non-null      float64
dtypes: float64(6)
memory usage: 616.0+ bytes
© www.soinside.com 2019 - 2024. All rights reserved.