如何查找给定值两侧都有值的行?

问题描述 投票:0回答:1

Python、Pandas,我有一个包含日期时间和值的数据框。

# Create an empty DataFrame with 'timestamp' and 'value' columns
df = pd.DataFrame(columns=['timestamp', 'value'])
df.set_index('timestamp', inplace=True)

随着时间的推移,我将数据附加到该帧。

在某些时候,我想找到某个时间戳处的值。如果它已经在 df 中,那就太好了,很容易找到。

但是,如果我要查找的时间位于两个现有值之间,我如何才能快速找到该时间,并在这两个包围值之间进行插值? ChatGPT 通过无效的比较带领我进行了一场毫无结果的快乐追逐。

这是我迄今为止尝试过的方法,但不起作用:

                # Check if the target timestamp exists in the DataFrame
                timestamps = df.index
                if target_timestamp in timestamps:
                    # Exact match found
                    return df.loc[target_timestamp, 'value']
                else:
                    # Use searchsorted to find the insertion point
                    pos = timestamps.searchsorted(target_timestamp)

                    if pos == 0 or pos == len(timestamps) - 1:
                        raise ValueError("Target timestamp is out of bounds for interpolation")

                    if target_timestamp > timestamps[pos]:
                        previous_timestamp = timestamps[pos]
                        next_timestamp = timestamps[pos + 1]
                    else:
                        previous_timestamp = timestamps[pos - 1]
                        next_timestamp = timestamps[pos]

                    # Interpolating the value
                    previous_value = df.loc[previous_timestamp, 'value']
                    next_value = df.loc[next_timestamp, 'value']

                    # Linear interpolation formula
                    interpolated_value = previous_value + (next_value - previous_value) * \
                                        (target_timestamp - previous_timestamp) / (next_timestamp - previous_timestamp)

                    return interpolated_value

python pandas search interpolation
1个回答
0
投票

一种选择是

reindex
具有完整日期范围的数据框,然后
interpolate
值。完成此操作后,您可以按范围内的任何日期进行过滤。

import pandas as pd

df = pd.DataFrame(columns=["timestamp", "value"])
df.set_index("timestamp", inplace=True)

df["timestamp"] = pd.to_datetime(["2022-01-01", "2022-01-02", "2022-01-04"])
df["value"] = [1, 2, 4]

out = (
    df.set_index("timestamp")
    .reindex(pd.date_range(df["timestamp"].min(), df["timestamp"].max()))
    .interpolate()
)

out.query("index == '2022-01-03'")
            value
2022-01-03    3.0
© www.soinside.com 2019 - 2024. All rights reserved.