S形曲线形状数据的优化拟合

问题描述 投票:0回答:1

我有一个日期时间和海拔数据集(下面是实际的现场数据),它具有 sigmoid 形状,我想为其拟合 sigmoid 曲线,优化参数以获得最佳拟合。 这对我来说似乎是直截了当的前言,但我已经陷入了僵局,我无法摆脱它。 任何人可以提供的任何帮助将不胜感激。

数据如下: | | 日期时间 | SP 2 | |:-- |:----------------|:-------- | |2 |2009-08-06 08:00:00|30.7359 |5 |2010-07-28 08:00:00|30.7364 |6 |2011-06-13 08:00:00|30.7333 |8 |2013-08-08 08:00:00|30.7218 |10|2015-07-20 08:00:00|30.7047 |11|2016-07-08 08:00:00|30.6936 |12|2017-08-01 08:00:00|30.6749 |13|2018-07-30 08:00:00|30.6610 |14|2019-07-11 08:00:00|30.6433 |15|2020-07-01 08:00:00|30.6256 |16|2021-06-29 08:00:00|30.6084 |17|2022-07-11 08:00:00|30.5980 |18|2023-08-01 08:00:00|30.5836 |40|2024-07-26 08:00:00|30.5767

上面是从更大的数据帧(df_LS_MON)绘制的以下代码生成的数据帧:

    #  Input data.  This is the actual field data.  Datetime of acquisition, 
    #  and elevation in meters of a monument SP 2 from a high precision level survey.
data = pd.DataFrame()
data.loc[:,'Date Time'] = df_LS_MON['Date Time']
data.loc[:,'SP 2'] = df_LS_MON['SP 2']
print(data)

我可以绘制一条类似于数据的 sigmoid 曲线,但它没有对此进行优化,请参见下文:

    #  Function to predict mine subsidence trend:
    #  This doesn't optimize the sigmoid curve fit, it just draws a curve from 
       parameters I give it, 
    #  but it gets me nearby the correct parameters.

    #  Convert 'Date Time' to numerical representation (e.g., days since a reference 
       date)
    #  Choose a reference date (e.g., the first date in the dataset)
reference_date = df_LS_MON['Date Time'].min()
    # Calculate days since reference date
t = (df_LS_MON['Date Time'] - reference_date).dt.days.values

    #  The input to draw the sigmoid curve
sig_rate = 2.6     # Lower is narrower horizontally, higher is broader horizontally
sig_scale = .2     # Lower is narrower vertically, higher is broader vertically
t_OS = 6.5         # Consider adjusting this based on the reference date, Lower draws 
                     the curve up and left, higher extends the curve down and right
elev_offset = 30.56     # The offset to the Y direction (elevation) to move from near 
                          0 to near the data

    # A list to hold the output
sigmoid_curve = []

    # Define the function to calculate the sigmoid curve from the input values above
def mine_sub(t, rate=sig_rate, scale=sig_scale, time_offset=t_OS):
  y = (scale * (1 - (1 / (1 + np.exp(-(t - time_offset) / rate))))) + elev_offset
  return (y)
    """
    Calculates the sigmoid function.
    Args:
    t: The input value or an array of values (time, represented as days since a 
       reference date).
    rate: The rate of change of the sigmoid function.
    scale: The scale of the sigmoid function.
    time_offset: The time offset of the sigmoid function.
    Returns:
    The sigmoid function output.
    """

    #  Call the function and populate the list sigmoid_curve with the output
for i in range (len(t)):
  mine_sub(t=i)
  sigmoid_curve.append(mine_sub(t=i))

    #  Put the function output in the 'data' dataframe
sigmoid_floats = [float(sigmoid_curve) for sigmoid_curve in sigmoid_curve]
data['Sig_Curve'] = sigmoid_floats

    #  Plot the data and the sigmoid curve drawn from the function output
plt.figure(figsize=(10, 5), dpi=150)

plt.plot(data['Date Time'], data['SP 2'], label='SP 2')
plt.plot(data['Date Time'], data['Sig_Curve'], label='Sigmoid Curve')

plt.title('data SP 2 (Only End of July Surveys) with the Sigmoid Curve')
plt.legend(loc='upper right')
plt.xlim(dt.datetime(2008,1,1), dt.datetime(2024,12,31))
plt.show()

    #  Displpay the function output in sigmoid_curve and the time input (elapsed 
       days)
display(data)
display(t)

上面的结果如下:

SP 2 数据图和绘制的 S 形曲线

日期时间 SP 2 Sig_曲线
2 2009-08-06 08:00:00 30.7359 30.744828
5 2010-07-28 08:00:00 30.7364 30.738478
6 2011-06-13 08:00:00 30.7333 30.729902
8 2013-08-08 08:00:00 30.7218 30.718700
10 2015-07-20 08:00:00 30.7047 30.704686
11 2016-07-08 08:00:00 30.6936 30.688072
12 2017-08-01 08:00:00 30.6749 30.669586
13 2018-07-30 08:00:00 30.6610 30.650414
14 2019-07-11 08:00:00 30.6433 30.631928
15 2020-07-01 08:00:00 30.6256 30.615314
16 2021-06-29 08:00:00 30.6084 30.601300
17 2022-07-11 08:00:00 30.5980 30.590098
18 2023-08-01 08:00:00 30.5836 30.581522
40 2024-07-26 08:00:00 30.5767 30.575172

以下是数组 t,自第一个日期以来经过的天数

数组([ 0, 356, 676, 1463, 2174, 2528, 2917, 3280, 3626, 3982, 4345, 4722、5108、5468])

因此,优化后的 sigmoid 曲线可能应该类似于上面的内容,这就是我优化最适合数据的曲线的起点。

以下是我尝试用来优化数据拟合的代码。 我大量借鉴了之前的文章,使用 Python 将 sigmoid 函数(“S”形曲线)拟合到数据,它确实很有帮助,我在优化方面也遇到了类似的问题,但我找不到方法解决了这个问题。

        #  This code is to optimize the sigmoid curve parameters to find the best fit to the SP 2 elevation data
    #  Starting input parameters that should be close, and are taken from the code above that just drew the sigmoid curve
#sig_rate = 2.6
#sig_scale = .2
#t_OS = 6.5  # Consider adjusting this based on your reference date
#elev_offset = 30.56
#sigmoid_curve = []

    # This is the input data to be fit
reference_date = data['Date Time'].min()     # This is the minimum datetime value.
xdata = (data['Date Time'] - reference_date).dt.days.values     # xdata is the datetime value less the minimum datetime value, so its the elapsed time in days
ydata = data['SP 2']

    #  Parameters for the Logistic Function I tried as well
#The parameters optimized are L, x0, k, b, who are initially assigned in p0, the point the optimization starts from.
#  L is responsible for scaling the output range from [0,1] to [0,L]
#  b adds bias to the output and changes its range from [0,L] to [b,L+b]
#  k is responsible for scaling the input, which remains in (-inf,inf)
#  x0 is the point in the middle of the Sigmoid, i.e. the point where Sigmoid should originally output the value 1/2 [since if x=x0, we get 1/(1+exp(0)) = 1/2].

    #  The initial guesses used during the Logistic Function tries
    #       L              b         k      x0
#p0 = [max(ydata), np.median(xdata), 1, min(ydata)] # this is a mandatory initial guess
#p0 = [max(ydata), np.median(xdata), min(ydata)] # this is a mandatory initial guess

    #  I tried the Logistic Function as well but it didn't work any better, if you remove the # and add it to the other function, the below will run, change p0 as well.
#def mine_sub(x, L ,x0, k, b):
#    y = L / (1 + np.exp(-k*(x-x0))) + b
#    return (y)

    #  The initial guess of parameters taken from the code above that just drew the sigmoid curve beside the SP 2 elevations
p0 = [2.6, .2, 6.5, 30.56]

     #   Define the function with variables to be optimized
def mine_sub(t, rate, scale, time_offset, elev_offset):
  y = scale * (1 - (1 / (1 + np.exp(-(t - time_offset) / rate)))) + elev_offset
  return (y)

     #  Find the optimized parameters
popt, pcov = curve_fit(mine_sub, xdata, ydata, p0, method='dogbox')   # Try method='dogbox' also to see what difference it makes, other method='lm' or 'trf'

    #  Print the optimized parameters
print("Optimized parameters:", popt)

    #  Plot the results
plt.plot(xdata, ydata, 'o', label='data')
plt.plot(xdata, mine_sub(xdata, *popt), 'r-', label='fit')
plt.legend()
plt.show()

上面的结果如下:

优化参数:[ 2.60169647 0.0836988 6.49932141 30.65856154]

S形曲线的优化拟合

所以,这显然是不正确的。 在上面的代码中,我还尝试了逻辑函数,但它也会产生块状响应。 您将在上面的代码中找到注释掉的函数和 p0。

无论采用哪种方法,我都会得到块状响应,这似乎是一个数字处理问题,而不是函数方程不正确。

使用 Logistic 函数,我还收到“RuntimeWarning:exp 中遇到溢出”

因此,任何可以帮助我解决这个问题的人的任何想法将不胜感激。

最诚挚的问候, 格雷格

datetime sigmoid pcov popt
1个回答
0
投票

我使用由四个形状变量(x/y 尺度和 x/y 偏移)参数化的 sigmoid 得到了很好的拟合,并进行了一些有用的初始猜测 (

p0=
)。

enter image description here

Estimated parameters: [ 0.186 30.555 -0.001 -3.69 ]
def sigmoid(x, y_scale, y_shift, x_scale, x_shift):
    """Sigmoid(x) parameterised by four shape/offset variables

    Parameters
    ----------
    x : np.ndarray
        x-axis
    {x, y}_scale : float
        multiplied with the respective axis to scale or mirror it
    {x, y}_shift : float
        added to scaled x or final y to shift the curve's centre

    Returns
    ----------
    np.ndarray : Parameterized sigmoid's value at each x
    """
    return y_scale / (1 + np.exp(-x * x_scale + x_shift)) + y_shift

可重现的示例

数据:

             Date Time     SP 2
2  2009-08-06 08:00:00  30.7359
5  2010-07-28 08:00:00  30.7364
6  2011-06-13 08:00:00  30.7333
8  2013-08-08 08:00:00  30.7218
10 2015-07-20 08:00:00  30.7047
11 2016-07-08 08:00:00  30.6936
12 2017-08-01 08:00:00  30.6749
13 2018-07-30 08:00:00  30.6610
14 2019-07-11 08:00:00  30.6433
15 2020-07-01 08:00:00  30.6256
16 2021-06-29 08:00:00  30.6084
17 2022-07-11 08:00:00  30.5980
18 2023-08-01 08:00:00  30.5836
40 2024-07-26 08:00:00  30.5767

代码:

import pandas as pd
import numpy as np
from scipy.optimize import curve_fit

np.set_printoptions(suppress=True)

#
#Data to fit
#
from io import StringIO

data_str = """
,Date Time,SP 2
2, 2009-08-06 08:00:00, 30.7359
5, 2010-07-28 08:00:00, 30.7364
6, 2011-06-13 08:00:00, 30.7333
8, 2013-08-08 08:00:00, 30.7218
10, 2015-07-20 08:00:00, 30.7047
11, 2016-07-08 08:00:00, 30.6936
12, 2017-08-01 08:00:00, 30.6749
13, 2018-07-30 08:00:00, 30.6610
14, 2019-07-11 08:00:00, 30.6433
15, 2020-07-01 08:00:00, 30.6256
16, 2021-06-29 08:00:00, 30.6084
17, 2022-07-11 08:00:00, 30.5980
18, 2023-08-01 08:00:00, 30.5836
40, 2024-07-26 08:00:00, 30.5767
"""

#Read data into DataFrame
# - Convert dates
# - Add a 't = days since day 0' column
df = (
    pd.read_csv(StringIO(data_str), index_col=0, parse_dates=['Date Time'])
    .assign(days_since_day0=lambda df_: (df_['Date Time'] - df_['Date Time'].iat[0]).dt.days)
)
display(df)

#Visualise data
ax = df.plot(
    x='days_since_day0', y='SP 2',
    color='dodgerblue',
    marker='o', markeredgecolor='black', markeredgewidth=1.5,
    linestyle='none',
    figsize=(5, 3),
    ylabel='SP 2', label='SP 2 data'
)

#
# Define a parameterised sigmoid and fit to data
#
def sigmoid(x, y_scale, y_shift, x_scale, x_shift):
    """Sigmoid(x) parameterised by four shape/offset variables
    
    Parameters
    ----------
    x : np.ndarray
        x-axis
    {x, y}_scale : float
        multiplied with the respective axis to scale or mirror it
    {x, y}_shift : float
        added to scaled x or final y to shift the curve's centre

    Returns
    ----------
    np.ndarray : Parameterized sigmoid's value at each x
    """
    return y_scale / (1 + np.exp(-x * x_scale + x_shift)) + y_shift

#Some initial guesses for parameters
initial_guess = [1., df['SP 2'].mean(), 1 / df.days_since_day0.mean(), 1.]

#Fit curve to the data and report the parameters
params, _ = curve_fit(sigmoid, df['days_since_day0'], df['SP 2'], p0=initial_guess)
print('Estimated parameters:', params.round(3))

#Define a high-resolution x-axis to plot the curve over
x_fine = np.linspace(*df.days_since_day0.agg(['min', 'max']), num=100)

#Overlay fitted curve
ax.plot(
    x_fine, sigmoid(x_fine, *params),
    lw=5, alpha=0.1, color='crimson',
    label='fit',
)
ax.legend(ncols=2, framealpha=1, shadow=True)

#Optional formatting
ax.set_title('Sigmoid fitted to SP2 data')
ax.spines[['top', 'right']].set_visible(False)
ax.spines.bottom.set_bounds([0, 5000])
ax.spines.left.set_bounds([30.575, 30.725])
© www.soinside.com 2019 - 2024. All rights reserved.