我们如何计算零膨胀泊松回归和零膨胀负二项回归(R 或 python)的平均绝对误差(MAE)?

问题描述 投票:0回答:1

现在,我尝试使用Python来计算平均绝对误差(MAE),同时进行零膨胀泊松回归和零膨胀负二项式回归。 我将数据分为训练数据和测试数据。我使用下面的代码,但它不起作用。

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.svm import SVR
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_absolute_error
from sklearn.preprocessing import StandardScaler
import statsmodels.api as sm
import statsmodels.formula.api as smf
import tensorflow as tf
df = pd.read_excel('....', sheet_name='Sheet1')
print(df.head())
X = df[['a', 'b', 'c', 'd', 'e', 'f']]
y = df['g']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

from statsmodels.discrete.count_model import ZeroInflatedPoisson
y_zip = y_train.values

y_zip_test = y_test.values

X_count =  X_train.values # Predictors for count part
X_zero = X_train.values  # Predictors for zero-inflation part

X_count_test = X_test.values
X_zero_test = X_test.values

# Add a constant for the intercept
X_count = sm.add_constant(X_count)
X_zero = sm.add_constant(X_zero)

# Fit the ZIP model
zip_model = ZeroInflatedPoisson(endog=y_zip, exog=X_count, exog_infl=X_zero, inflation='logit')
zip_model_fit = zip_model.fit()
print(zip_model_fit.summary())


# Make predictions
y_pred = zip_model_fit.predict(X_count_test)

# Calculate MAE
mae = np.mean(np.abs(y_zip_test - y_pred))
print(f'Mean Absolute Error: {mae}')

结果如下

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[3], line 33
     29 print(zip_model_fit.summary())
     32 # Make predictions
---> 33 y_pred = zip_model_fit.predict(X_count_test)
     35 # Calculate MAE của test
     36 mae = np.mean(np.abs(y_zip_test - y_pred))

File ~\anaconda3\envs\tf\lib\site-packages\statsmodels\base\model.py:1174, in Results.predict(self, exog, transform, *args, **kwargs)
   1127 """
   1128 Call self.model.predict with self.params as the first argument.
   1129 
   (...)
   1169 returned prediction.
   1170 """
   1171 exog, exog_index = self._transform_predict_exog(exog,
   1172                                                 transform=transform)
-> 1174 predict_results = self.model.predict(self.params, exog, *args,
   1175                                      **kwargs)
   1177 if exog_index is not None and not hasattr(predict_results,
   1178                                           'predicted_values'):
   1179     if predict_results.ndim == 1:

File ~\anaconda3\envs\tf\lib\site-packages\statsmodels\discrete\count_model.py:453, in GenericZeroInflated.predict(self, params, exog, exog_infl, exposure, offset, which, y_values)
    449 params_main = params[self.k_inflate:]
    451 prob_main = 1 - self.model_infl.predict(params_infl, exog_infl)
--> 453 lin_pred = np.dot(exog, params_main[:self.exog.shape[1]]) + exposure + offset
    455 # Refactor: This is pretty hacky,
    456 # there should be an appropriate predict method in model_main
    457 # this is just prob(y=0 | model_main)
    458 tmp_exog = self.model_main.exog

ValueError: shapes (21,6) and (7,) not aligned: 6 (dim 1) != 7 (dim 0)

您能给我一些解决方案吗?

我尝试计算MAE,但多次出现错误。

python machine-learning non-linear-regression poisson
1个回答
0
投票

您对训练数据执行以下步骤。

# Add a constant for the intercept
X_count = sm.add_constant(X_count)
X_zero = sm.add_constant(X_zero)

但是,您不会对测试数据执行此操作。我相信这可能是您的问题,因为尺寸是根据您的错误而定的。

© www.soinside.com 2019 - 2024. All rights reserved.