我有一个日期时间和海拔数据集(下面是实际的现场数据),它具有 sigmoid 形状,我想为其拟合 sigmoid 曲线,优化参数以获得最佳拟合。 这对我来说似乎是直截了当的前言,但我已经陷入了僵局,我无法摆脱它。 任何人可以提供的任何帮助将不胜感激。
数据如下: | | 日期时间 | SP 2 | |:-- |:----------------|:-------- | |2 |2009-08-06 08:00:00|30.7359 |5 |2010-07-28 08:00:00|30.7364 |6 |2011-06-13 08:00:00|30.7333 |8 |2013-08-08 08:00:00|30.7218 |10|2015-07-20 08:00:00|30.7047 |11|2016-07-08 08:00:00|30.6936 |12|2017-08-01 08:00:00|30.6749 |13|2018-07-30 08:00:00|30.6610 |14|2019-07-11 08:00:00|30.6433 |15|2020-07-01 08:00:00|30.6256 |16|2021-06-29 08:00:00|30.6084 |17|2022-07-11 08:00:00|30.5980 |18|2023-08-01 08:00:00|30.5836 |40|2024-07-26 08:00:00|30.5767
上面是从更大的数据帧(df_LS_MON)绘制的以下代码生成的数据帧:
# Input data. This is the actual field data. Datetime of acquisition,
# and elevation in meters of a monument SP 2 from a high precision level survey.
data = pd.DataFrame()
data.loc[:,'Date Time'] = df_LS_MON['Date Time']
data.loc[:,'SP 2'] = df_LS_MON['SP 2']
print(data)
我可以绘制一条类似于数据的 sigmoid 曲线,但它没有对此进行优化,请参见下文:
# Function to predict mine subsidence trend:
# This doesn't optimize the sigmoid curve fit, it just draws a curve from
parameters I give it,
# but it gets me nearby the correct parameters.
# Convert 'Date Time' to numerical representation (e.g., days since a reference
date)
# Choose a reference date (e.g., the first date in the dataset)
reference_date = df_LS_MON['Date Time'].min()
# Calculate days since reference date
t = (df_LS_MON['Date Time'] - reference_date).dt.days.values
# The input to draw the sigmoid curve
sig_rate = 2.6 # Lower is narrower horizontally, higher is broader horizontally
sig_scale = .2 # Lower is narrower vertically, higher is broader vertically
t_OS = 6.5 # Consider adjusting this based on the reference date, Lower draws
the curve up and left, higher extends the curve down and right
elev_offset = 30.56 # The offset to the Y direction (elevation) to move from near
0 to near the data
# A list to hold the output
sigmoid_curve = []
# Define the function to calculate the sigmoid curve from the input values above
def mine_sub(t, rate=sig_rate, scale=sig_scale, time_offset=t_OS):
y = (scale * (1 - (1 / (1 + np.exp(-(t - time_offset) / rate))))) + elev_offset
return (y)
"""
Calculates the sigmoid function.
Args:
t: The input value or an array of values (time, represented as days since a
reference date).
rate: The rate of change of the sigmoid function.
scale: The scale of the sigmoid function.
time_offset: The time offset of the sigmoid function.
Returns:
The sigmoid function output.
"""
# Call the function and populate the list sigmoid_curve with the output
for i in range (len(t)):
mine_sub(t=i)
sigmoid_curve.append(mine_sub(t=i))
# Put the function output in the 'data' dataframe
sigmoid_floats = [float(sigmoid_curve) for sigmoid_curve in sigmoid_curve]
data['Sig_Curve'] = sigmoid_floats
# Plot the data and the sigmoid curve drawn from the function output
plt.figure(figsize=(10, 5), dpi=150)
plt.plot(data['Date Time'], data['SP 2'], label='SP 2')
plt.plot(data['Date Time'], data['Sig_Curve'], label='Sigmoid Curve')
plt.title('data SP 2 (Only End of July Surveys) with the Sigmoid Curve')
plt.legend(loc='upper right')
plt.xlim(dt.datetime(2008,1,1), dt.datetime(2024,12,31))
plt.show()
# Displpay the function output in sigmoid_curve and the time input (elapsed
days)
display(data)
display(t)
上面的结果如下:
日期时间 | SP 2 | Sig_曲线 | |
---|---|---|---|
2 | 2009-08-06 08:00:00 | 30.7359 | 30.744828 |
5 | 2010-07-28 08:00:00 | 30.7364 | 30.738478 |
6 | 2011-06-13 08:00:00 | 30.7333 | 30.729902 |
8 | 2013-08-08 08:00:00 | 30.7218 | 30.718700 |
10 | 2015-07-20 08:00:00 | 30.7047 | 30.704686 |
11 | 2016-07-08 08:00:00 | 30.6936 | 30.688072 |
12 | 2017-08-01 08:00:00 | 30.6749 | 30.669586 |
13 | 2018-07-30 08:00:00 | 30.6610 | 30.650414 |
14 | 2019-07-11 08:00:00 | 30.6433 | 30.631928 |
15 | 2020-07-01 08:00:00 | 30.6256 | 30.615314 |
16 | 2021-06-29 08:00:00 | 30.6084 | 30.601300 |
17 | 2022-07-11 08:00:00 | 30.5980 | 30.590098 |
18 | 2023-08-01 08:00:00 | 30.5836 | 30.581522 |
40 | 2024-07-26 08:00:00 | 30.5767 | 30.575172 |
以下是数组 t,自第一个日期以来经过的天数
数组([ 0, 356, 676, 1463, 2174, 2528, 2917, 3280, 3626, 3982, 4345, 4722、5108、5468])
因此,优化后的 sigmoid 曲线可能应该类似于上面的内容,这就是我优化最适合数据的曲线的起点。
以下是我尝试用来优化数据拟合的代码。 我大量借鉴了之前的文章,使用 Python 将 sigmoid 函数(“S”形曲线)拟合到数据,它确实很有帮助,我在优化方面也遇到了类似的问题,但我找不到方法解决了这个问题。
# This code is to optimize the sigmoid curve parameters to find the best fit to the SP 2 elevation data
# Starting input parameters that should be close, and are taken from the code above that just drew the sigmoid curve
#sig_rate = 2.6
#sig_scale = .2
#t_OS = 6.5 # Consider adjusting this based on your reference date
#elev_offset = 30.56
#sigmoid_curve = []
# This is the input data to be fit
reference_date = data['Date Time'].min() # This is the minimum datetime value.
xdata = (data['Date Time'] - reference_date).dt.days.values # xdata is the datetime value less the minimum datetime value, so its the elapsed time in days
ydata = data['SP 2']
# Parameters for the Logistic Function I tried as well
#The parameters optimized are L, x0, k, b, who are initially assigned in p0, the point the optimization starts from.
# L is responsible for scaling the output range from [0,1] to [0,L]
# b adds bias to the output and changes its range from [0,L] to [b,L+b]
# k is responsible for scaling the input, which remains in (-inf,inf)
# x0 is the point in the middle of the Sigmoid, i.e. the point where Sigmoid should originally output the value 1/2 [since if x=x0, we get 1/(1+exp(0)) = 1/2].
# The initial guesses used during the Logistic Function tries
# L b k x0
#p0 = [max(ydata), np.median(xdata), 1, min(ydata)] # this is a mandatory initial guess
#p0 = [max(ydata), np.median(xdata), min(ydata)] # this is a mandatory initial guess
# I tried the Logistic Function as well but it didn't work any better, if you remove the # and add it to the other function, the below will run, change p0 as well.
#def mine_sub(x, L ,x0, k, b):
# y = L / (1 + np.exp(-k*(x-x0))) + b
# return (y)
# The initial guess of parameters taken from the code above that just drew the sigmoid curve beside the SP 2 elevations
p0 = [2.6, .2, 6.5, 30.56]
# Define the function with variables to be optimized
def mine_sub(t, rate, scale, time_offset, elev_offset):
y = scale * (1 - (1 / (1 + np.exp(-(t - time_offset) / rate)))) + elev_offset
return (y)
# Find the optimized parameters
popt, pcov = curve_fit(mine_sub, xdata, ydata, p0, method='dogbox') # Try method='dogbox' also to see what difference it makes, other method='lm' or 'trf'
# Print the optimized parameters
print("Optimized parameters:", popt)
# Plot the results
plt.plot(xdata, ydata, 'o', label='data')
plt.plot(xdata, mine_sub(xdata, *popt), 'r-', label='fit')
plt.legend()
plt.show()
上面的结果如下:
优化参数:[ 2.60169647 0.0836988 6.49932141 30.65856154]
所以,这显然是不正确的。 在上面的代码中,我还尝试了逻辑函数,但它也会产生块状响应。 您将在上面的代码中找到注释掉的函数和 p0。
无论采用哪种方法,我都会得到块状响应,这似乎是一个数字处理问题,而不是函数方程不正确。
使用 Logistic 函数,我还收到“RuntimeWarning:exp 中遇到溢出”
因此,任何可以帮助我解决这个问题的人的任何想法将不胜感激。
最诚挚的问候, 格雷格
我使用由四个形状变量(x/y 尺度和 x/y 偏移)参数化的 sigmoid 得到了很好的拟合,并进行了一些有用的初始猜测 (
p0=
)。
Estimated parameters: [ 0.186 30.555 -0.001 -3.69 ]
def sigmoid(x, y_scale, y_shift, x_scale, x_shift):
"""Sigmoid(x) parameterised by four shape/offset variables
Parameters
----------
x : np.ndarray
x-axis
{x, y}_scale : float
multiplied with the respective axis to scale or mirror it
{x, y}_shift : float
added to scaled x or final y to shift the curve's centre
Returns
----------
np.ndarray : Parameterized sigmoid's value at each x
"""
return y_scale / (1 + np.exp(-x * x_scale + x_shift)) + y_shift
数据:
Date Time SP 2
2 2009-08-06 08:00:00 30.7359
5 2010-07-28 08:00:00 30.7364
6 2011-06-13 08:00:00 30.7333
8 2013-08-08 08:00:00 30.7218
10 2015-07-20 08:00:00 30.7047
11 2016-07-08 08:00:00 30.6936
12 2017-08-01 08:00:00 30.6749
13 2018-07-30 08:00:00 30.6610
14 2019-07-11 08:00:00 30.6433
15 2020-07-01 08:00:00 30.6256
16 2021-06-29 08:00:00 30.6084
17 2022-07-11 08:00:00 30.5980
18 2023-08-01 08:00:00 30.5836
40 2024-07-26 08:00:00 30.5767
代码:
import pandas as pd
import numpy as np
from scipy.optimize import curve_fit
np.set_printoptions(suppress=True)
#
#Data to fit
#
from io import StringIO
data_str = """
,Date Time,SP 2
2, 2009-08-06 08:00:00, 30.7359
5, 2010-07-28 08:00:00, 30.7364
6, 2011-06-13 08:00:00, 30.7333
8, 2013-08-08 08:00:00, 30.7218
10, 2015-07-20 08:00:00, 30.7047
11, 2016-07-08 08:00:00, 30.6936
12, 2017-08-01 08:00:00, 30.6749
13, 2018-07-30 08:00:00, 30.6610
14, 2019-07-11 08:00:00, 30.6433
15, 2020-07-01 08:00:00, 30.6256
16, 2021-06-29 08:00:00, 30.6084
17, 2022-07-11 08:00:00, 30.5980
18, 2023-08-01 08:00:00, 30.5836
40, 2024-07-26 08:00:00, 30.5767
"""
#Read data into DataFrame
# - Convert dates
# - Add a 't = days since day 0' column
df = (
pd.read_csv(StringIO(data_str), index_col=0, parse_dates=['Date Time'])
.assign(days_since_day0=lambda df_: (df_['Date Time'] - df_['Date Time'].iat[0]).dt.days)
)
display(df)
#Visualise data
ax = df.plot(
x='days_since_day0', y='SP 2',
color='dodgerblue',
marker='o', markeredgecolor='black', markeredgewidth=1.5,
linestyle='none',
figsize=(5, 3),
ylabel='SP 2', label='SP 2 data'
)
#
# Define a parameterised sigmoid and fit to data
#
def sigmoid(x, y_scale, y_shift, x_scale, x_shift):
"""Sigmoid(x) parameterised by four shape/offset variables
Parameters
----------
x : np.ndarray
x-axis
{x, y}_scale : float
multiplied with the respective axis to scale or mirror it
{x, y}_shift : float
added to scaled x or final y to shift the curve's centre
Returns
----------
np.ndarray : Parameterized sigmoid's value at each x
"""
return y_scale / (1 + np.exp(-x * x_scale + x_shift)) + y_shift
#Some initial guesses for parameters
initial_guess = [1., df['SP 2'].mean(), 1 / df.days_since_day0.mean(), 1.]
#Fit curve to the data and report the parameters
params, _ = curve_fit(sigmoid, df['days_since_day0'], df['SP 2'], p0=initial_guess)
print('Estimated parameters:', params.round(3))
#Define a high-resolution x-axis to plot the curve over
x_fine = np.linspace(*df.days_since_day0.agg(['min', 'max']), num=100)
#Overlay fitted curve
ax.plot(
x_fine, sigmoid(x_fine, *params),
lw=5, alpha=0.1, color='crimson',
label='fit',
)
ax.legend(ncols=2, framealpha=1, shadow=True)
#Optional formatting
ax.set_title('Sigmoid fitted to SP2 data')
ax.spines[['top', 'right']].set_visible(False)
ax.spines.bottom.set_bounds([0, 5000])
ax.spines.left.set_bounds([30.575, 30.725])