我尝试使用 Scipy 的优化 curve_fit 将建模数据拟合为曲线来拟合一些分散的数据,但我得到了 (bX) 参数的负值,这是没有意义的。所以我不确定代码有什么问题。
# Define linear quadratic curve
def linq(x, aparam, bparam):
return np.exp(-(aparam*x+ bparam*x**2))
#new model
NX=46
def abX(x, aXparam, bXparam):
return np.exp(-(aXparam + bXparam)*x)*(1 + bXparam*x/NX)**NX
# define result array
avalues =[]
davalues =[]
bvalues=[]
dbvalues =[]
Chi2=[]
# Model parameters
Dmax = 10.
a1=0.740; b1=.1100
# ensemble size
Nensemble = 10
for iexp in range(1, Nensemble+1):
# Create the artificial dataset
nobs = int(Dmax + .5)
x = np.arange(.5,nobs)
N = linq(x,a1,b1)
Nfluct = 0.3*N*np.random.normal(size=nobs)
N = N + Nfluct
sig = np.zeros(nobs) + 0.3*N
# Fit the curve
start = (0,1.)
popt, pcov = curve_fit(abX,x,N,sigma = sig,p0 = start,absolute_sigma=True)
# add result to result vectors
avalues=avalues+[popt[0]]
bvalues=bvalues+[popt[1]]
davalues=davalues+[np.sqrt(pcov[0,0])]
dbvalues=dbvalues+[np.sqrt(pcov[1,1])]
print("mean aX value =",np.mean(avalues),"+/-",np.std(avalues)/np.sqrt(Nensemble-1),", gen =",a1)
print("mean bX value =",np.mean(bvalues),"+/-",np.std(bvalues)/np.sqrt(Nensemble-1),", gen =",b1)
mean aX value = 0.9829722097543518 +/- 0.019840535134131847 , gen = 0.74
mean bX value = -2.3048945875855527 +/- 0.024140253716692855 , gen = 0.11
首先让我们检查您的第二个模型是否适合第一个模型:
a1 = 0.740
b1 = 0.1100
def model1(x, a, b):
return np.exp(-(a * x + b * x **2))
def model2(x, a, b, N=46):
return np.exp(- (a + b) * x) * (1 + b * x / N) ** N
xlin = np.linspace(0.5, 9.5, 200)
ylin = model1(xlin, a1, b1)
popt, pcov = optimize.curve_fit(model2, xlin, ylin, p0=[0., 1.])
# (array([ 0.7524258 , -2.93622762]),
# array([[2.51911174e-07, 1.96469005e-06],
# [1.96469005e-06, 1.92412517e-05]]))
我们可以确认两个模型之间具有正确的适应度。
现在让我们检查一下这是最佳参数集:
def loss_factory(model, x, y):
@np.vectorize
def wrapper(a, b):
return 0.5 * np.sum(np.power(y - model(x, a, b), 2))
return wrapper
alin = np.linspace(0, 2, 300)
blin = np.linspace(-5, 5, 300)
A, B = np.meshgrid(alin, blin)
loss = loss_factory(model2, xlin, ylin)
SSE = loss(A, B)
我们可以看到在探索的参数区域中有两个最小值。由于初始参数,找到较低的那个(负
b
)。
要达到第二个最小值,只需更改初始参数:
popt, pcov = optimize.curve_fit(model2, xlin, ylin, p0=[0., 5.])
# (array([0.72716386, 3.44478853]),
# array([[ 2.99618275e-07, -2.70844576e-06],
# [-2.70844576e-06, 2.85220271e-05]]))
误差景观
SSE
的本质是由数据集决定的(这里严格遵循model2
),并且多个最小值的存在可能是偶次幂(N=46
)的影响。