如何通过SciPy的曲线拟合函数获得预测间隔/预测带?
更具体地说,如何获得通常用于下降曲线分析的双曲线的这些预测带?
任何帮助将不胜感激。
import pandas as pd
import numpy as np
from datetime import timedelta
from scipy.optimize import curve_fit
def hyperbolic_equation(t, qi, b, di):
return qi/((1.0+b*di*t)**(1.0/b))
df1 = pd.DataFrame({ 'cumsum_days': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],
'prod': [800, 900, 1200, 700, 600,
550, 500, 650, 625, 600,
550, 525, 500, 400, 350]})
qi = max(df1['prod'])
#Hyperbolic curve fit the data to get best fit equation
popt_hyp, pcov_hyp = curve_fit(hyperbolic_equation, df1['cumsum_days'], df1['prod'],bounds=(0, [qi,1,20]))
#Passing t to estimate the coefficients:
def fitted_hyperbolic_equation(t):
return popt_hyp[0]/((1.0+popt_hyp[1]*popt_hyp[2]*t)**(1.0/popt_hyp[1]))
#Creating future time to predict on:
df2 = pd.DataFrame({ 'future_days': [16,17,18,19,20]})
fitted_hyperbolic_equation(df2.future_days)
16 388.259631
17 368.389649
18 349.754534
19 332.264306
20 315.836485
我有我的未来价值,但如何使用SciPy生成置信度/预测范围(95%)?任何帮助,将不胜感激。
我不确定我是否完全理解,但是我认为您正在要求曲线拟合模型的预测值具有不确定性。
我建议为此使用lmfit
(免责声明:我是作者),因为它提供了进行此类计算的方法。恐怕您的模型和数据不能很好地匹配,因此不确定性很大
使用lmfit
并使用普通的numpy
数组而不是pandas
数据帧(可以使用这些数据帧,但在这里分散了注意力-适合的确需要numpy
数组),您的分析可能像这样:
import numpy as np
from lmfit import Model
import matplotlib.pyplot as plt
cumsum_days = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])
prod = np.array([800, 900, 1200, 700, 600, 550, 500, 650, 625, 600, 550,
525, 500, 400, 350])
# plot data
plt.plot(cumsum_days, prod, 'bo', label='data')
def hyperbolic_equation(t, qi, b, di):
return qi/((1.0+b*di*t)**(1.0/max(b, 1.e-50)))
# build Model
hmodel = Model(hyperbolic_equation)
# create lmfit Parameters, named from the arguments of `hyperbolic_equation`
# note that you really must provide initial values.
params = hmodel.make_params(qi=1000, b=0.5, di=0.1)
# set bounds on parameters
params['qi'].min=0
params['b'].min=0
params['di'].min=0
# do fit, print resulting parameters
result = hmodel.fit(prod, params, t=cumsum_days)
print(result.fit_report())
# plot best fit: not that great of fit, really
plt.plot(cumsum_days, result.best_fit, 'r--', label='fit')
# calculate the (1 sigma) uncertainty in the predicted model
# and plot that as a confidence band
dprod = result.eval_uncertainty(result.params, sigma=1)
plt.fill_between(cumsum_days,
result.best_fit-dprod,
result.best_fit+dprod,
color="#AB8888",
label='uncertainty band of fit')
# now evaluate the model for other values, predicting future values
future_days = np.array([16,17,18,19,20])
future_prod = result.eval(t=future_days)
plt.plot(future_days, future_prod, 'k--', label='prediction')
# ...and calculate the 1-sigma uncertainty in the future prediction
# for 95% confidence level, you'd want to use `sigma=3` here:
future_dprod = result.eval_uncertainty(t=future_days, sigma=1)
print("### Prediction\n# Day Prod Uncertainty")
for day, prod, eps in zip(future_days, future_prod, future_dprod):
print(" {:.1f} {:.1f} +/- {:.1f}".format(day, prod, eps))
plt.fill_between(future_days,
future_prod-future_dprod,
future_prod+future_dprod,
color="#ABABAB",
label='uncertainty band of prediction')
plt.legend(loc='lower left')
plt.show()
这将打印出结果的拟合统计量和]的参数值>
[[Model]] Model(hyperbolic_equation) [[Fit Statistics]] # fitting method = leastsq # function evals = 21 # data points = 15 # variables = 3 chi-square = 238946.482 reduced chi-square = 19912.2068 Akaike info crit = 151.139170 Bayesian info crit = 153.263321 [[Variables]] qi: 993.608482 +/- 163.710950 (16.48%) (init = 1000) b: 0.22855837 +/- 2.07615175 (908.37%) (init = 0.5) di: 0.06551315 +/- 0.06250023 (95.40%) (init = 0.1) [[Correlations]] (unreported correlations are < 0.100) C(b, di) = 0.963 C(qi, di) = 0.888 C(qi, b) = 0.771 ### Prediction # Day Prod Uncertainty 16.0 388.258 +/- 1080.106 17.0 368.387 +/- 1106.336 18.0 349.752 +/- 1130.091 19.0 332.261 +/- 1151.634 20.0 315.833 +/- 1171.196
并给出这样的情节:
在您的问题中,您没有以统计或图形方式检查拟合的质量。确实,您将需要这样做。
您还使用了curve_fit
,但未提供初始值。尽管没有底层的拟合例程会支持该方法,并且都需要显式的初始值,但curve_fit
允许这样做而没有警告或理由,并断言所有起始值均为1.0
。确实,您必须提供初始值。