Python、Pandas,我有一个包含日期时间和值的数据框。
# Create an empty DataFrame with 'timestamp' and 'value' columns
df = pd.DataFrame(columns=['timestamp', 'value'])
df.set_index('timestamp', inplace=True)
随着时间的推移,我将数据附加到该帧。
在某些时候,我想找到某个时间戳处的值。如果它已经在 df 中,那就太好了,很容易找到。
但是,如果我要查找的时间位于两个现有值之间,我如何才能快速找到该时间,并在这两个包围值之间进行插值? ChatGPT 通过无效的比较带领我进行了一场毫无结果的快乐追逐。
这是我迄今为止尝试过的方法,但不起作用:
# Check if the target timestamp exists in the DataFrame
timestamps = df.index
if target_timestamp in timestamps:
# Exact match found
return df.loc[target_timestamp, 'value']
else:
# Use searchsorted to find the insertion point
pos = timestamps.searchsorted(target_timestamp)
if pos == 0 or pos == len(timestamps) - 1:
raise ValueError("Target timestamp is out of bounds for interpolation")
if target_timestamp > timestamps[pos]:
previous_timestamp = timestamps[pos]
next_timestamp = timestamps[pos + 1]
else:
previous_timestamp = timestamps[pos - 1]
next_timestamp = timestamps[pos]
# Interpolating the value
previous_value = df.loc[previous_timestamp, 'value']
next_value = df.loc[next_timestamp, 'value']
# Linear interpolation formula
interpolated_value = previous_value + (next_value - previous_value) * \
(target_timestamp - previous_timestamp) / (next_timestamp - previous_timestamp)
return interpolated_value
一种选择是
reindex
具有完整日期范围的数据框,然后 interpolate
值。完成此操作后,您可以按范围内的任何日期进行过滤。
import pandas as pd
df = pd.DataFrame(columns=["timestamp", "value"])
df.set_index("timestamp", inplace=True)
df["timestamp"] = pd.to_datetime(["2022-01-01", "2022-01-02", "2022-01-04"])
df["value"] = [1, 2, 4]
out = (
df.set_index("timestamp")
.reindex(pd.date_range(df["timestamp"].min(), df["timestamp"].max()))
.interpolate()
)
out.query("index == '2022-01-03'")
value
2022-01-03 3.0