我需要绘制预测和置信区间,并且需要使用 python 并且仅使用以下包。如何在同一模型中绘制两个区间。我在 ChatGPT 的帮助下成功绘制了预测区间。这是包含数据集的代码。
import statsmodels.api as sm
import statsmodels.formula.api as smf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#dataframe
data = {
'X': [55641, 55681, 55637, 55825, 55772, 55890, 56068, 56299, 56825, 57205,
57562, 57850, 57975, 57992, 58240, 58414, 58561, 59066, 58596, 58631,
58758, 59037],
'Y': [21886, 21934, 21699, 21901, 21812, 21714, 21932, 22086, 22265, 22551,
22736, 22301, 22518, 22580, 22618, 22890, 23112, 23315, 22865, 22788,
22949, 23149]
}
df = pd.DataFrame(data)
#OLS
model = smf.ols(formula='Y ~ X', data=df)
results = model.fit()
print(results.summary())
#calculating prediction intevals
predictions = results.get_prediction(df)
prediction_summary_frame = predictions.summary_frame(alpha=0.05)
#data points
plt.scatter(df['X'], df['Y'], color='black', label='Data')
#regression line
plt.plot(df['X'], results.fittedvalues, color='#58C9F4', label='Regression Line')
#prediction invterval
plt.fill_between(df['X'], prediction_summary_frame['obs_ci_lower'], prediction_summary_frame['obs_ci_upper'], color='grey', alpha=0.2, label='95% Prediction Interval')
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Linear Regression with Prediction Intervals')
plt.legend()
plt.show()
我应该如何继续绘制两个区间?
import statsmodels.api as sm
import statsmodels.formula.api as smf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Dataframe
data = {
'X': [55641, 55681, 55637, 55825, 55772, 55890, 56068, 56299, 56825, 57205,
57562, 57850, 57975, 57992, 58240, 58414, 58561, 59066, 58596, 58631,
58758, 59037],
'Y': [21886, 21934, 21699, 21901, 21812, 21714, 21932, 22086, 22265, 22551,
22736, 22301, 22518, 22580, 22618, 22890, 23112, 23315, 22865, 22788,
22949, 23149]
}
df = pd.DataFrame(data)
# OLS
model = smf.ols(formula='Y ~ X', data=df)
results = model.fit()
print(results.summary())
# Calculating prediction intervals
predictions = results.get_prediction(df)
prediction_summary_frame = predictions.summary_frame(alpha=0.05)
# Calculating confidence intervals
confidence_intervals = results.conf_int(alpha=0.05)
# Data points
plt.scatter(df['X'], df['Y'], color='black', label='Data')
# Regression line
plt.plot(df['X'], results.fittedvalues, color='#58C9F4', label='Regression Line')
# Prediction interval
plt.fill_between(df['X'], prediction_summary_frame['obs_ci_lower'], prediction_summary_frame['obs_ci_upper'], color='grey', alpha=0.2, label='95% Prediction Interval')
# Confidence interval
plt.fill_between(df['X'], prediction_summary_frame['mean_ci_lower'], prediction_summary_frame['mean_ci_upper'], color='blue', alpha=0.2, label='95% Confidence Interval')
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Linear Regression with Prediction and Confidence Intervals')
plt.legend()
plt.show()
OLS Regression Results
==============================================================================
Dep. Variable: Y R-squared: 0.919
Model: OLS Adj. R-squared: 0.915
Method: Least Squares F-statistic: 227.5
Date: Tue, 11 Jun 2024 Prob (F-statistic): 2.17e-12
Time: 06:30:58 Log-Likelihood: -140.06
No. Observations: 22 AIC: 284.1
Df Residuals: 20 BIC: 286.3
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 559.4600 1450.698 0.386 0.704 -2466.642 3585.562
X 0.3815 0.025 15.084 0.000 0.329 0.434
==============================================================================
Omnibus: 0.314 Durbin-Watson: 1.479
Prob(Omnibus): 0.855 Jarque-Bera (JB): 0.390
Skew: -0.242 Prob(JB): 0.823
Kurtosis: 2.562 Cond. No. 2.64e+06
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 2.64e+06. This might indicate that there are
strong multicollinearity or other numerical problems.