我在 Polars 数据框中有 OHLC 股票数据,每天我想计算两次之间的最大扩展收盘价,
start = '09:15'
end = '10:15'
start_time = datetime.time.fromisoformat(start)
end_time = datetime.time.fromisoformat(end)
# Filter and calculate the expanding max within the time range for each day
max_values = und_df2.filter((und_df2['timestamp'].dt.time() >= start_time) & (und_df2['timestamp'].dt.time() <= end_time)) \
.group_by_dynamic('timestamp', every='1d', closed='left') \
.agg(pl.col('Close').cum_max().alias('fbarh3'))
# df.with_columns(pl.col(date_col).dt.truncate("1d").alias("date")) \
# .join(
# df.group_by_dynamic("timestamp", every="1d").agg(agg_func.alias(alias))
# .with_columns(pl.col(alias).shift(shift_value))
# ,
# left_on="date", right_on="timestamp", how='left'
# ) \
# .drop("date")
#Join the maximum values back to the original dataframe
und_df2 = und_df2.with_columns(pl.col('timestamp').dt.truncate('1d').alias('date')).join(max_values, left_on='date', right_on='timestamp' , how='left' ).drop('date')
max_values
虽然计算正确,它在开始时间和结束时间之间扩展了收盘价的最高值,但由于我将数据帧过滤到这些特定时间,所以我在聚合后丢失了时间分量。und_df2
)
#== this is max_values generated
shape: (824, 2)
timestamp fbarh3
datetime[μs] list[f64]
2020-03-02 00:00:00 [11356.8, 11358.45, … 11388.65]
2020-03-03 00:00:00 [11293.65, 11293.95, … 11310.7]
2020-03-04 00:00:00 [11296.95, 11296.95, … 11325.25]
2020-03-05 00:00:00 [11312.65, 11312.65, … 11312.65]
2020-03-06 00:00:00 [10894.8, 10914.85, … 10965.3]
每天从开始到结束输出应该不断扩大
highest(cum_max)
。我似乎无法将 max_values
重新加入到 und_df2
。
您的方法基本上是正确的,但您不需要
group_by_dynamic
。您可以通过这种方式执行此操作(此处为示例数据框。因此,将来在寻求帮助时,请发布示例数据)。在这里,我打印了所有步骤,以便您了解所做的事情:
import polars as pl
from datetime import time
data = {
'timestamp': [
'2023-06-01 09:00', '2023-06-01 09:15', '2023-06-01 09:30', '2023-06-01 09:45', '2023-06-01 10:00', '2023-06-01 10:15',
'2023-06-01 10:30', '2023-06-01 10:45', '2023-06-01 11:00', '2023-06-01 11:15',
'2023-06-02 09:00', '2023-06-02 09:15', '2023-06-02 09:30', '2023-06-02 09:45', '2023-06-02 10:00', '2023-06-02 10:15',
'2023-06-02 10:30', '2023-06-02 10:45', '2023-06-02 11:00', '2023-06-02 11:15'
],
'Close': [
100, 101, 102, 103, 104, 105,
106, 107, 108, 109,
110, 111, 112, 113, 114, 115,
116, 117, 118, 119
]
}
und_df2 = pl.DataFrame(data)
und_df2 = und_df2.with_columns(pl.col('timestamp').str.strptime(pl.Datetime, format='%Y-%m-%d %H:%M'))
start = '09:15'
end = '10:15'
start_time = time.fromisoformat(start)
end_time = time.fromisoformat(end)
und_df2 = und_df2.sort('timestamp')
filtered_df = und_df2.filter((und_df2['timestamp'].dt.time() >= start_time) & (und_df2['timestamp'].dt.time() <= end_time))
print("Filtered DataFrame:")
print(filtered_df)
filtered_df = filtered_df.with_columns(pl.col('timestamp').dt.truncate('1d').alias('date'))
max_values = filtered_df.group_by('date').agg(pl.col('Close').cum_max().alias('fbarh3'))
max_values = max_values.explode('fbarh3')
max_values = max_values.with_columns([
filtered_df['timestamp']
])
print("Max Values with Date and Time:")
print(max_values)
und_df2 = und_df2.with_columns([
pl.col('timestamp').dt.truncate('1d').alias('date'),
pl.col('timestamp').dt.time().alias('time')
])
print("Original DataFrame with Date and Time:")
print(und_df2)
result_df = und_df2.join(max_values, on=['date', 'timestamp'], how='left')
result_df = result_df.drop(['date', 'time'])
print("Result DataFrame:")
print(result_df)
导致
Filtered DataFrame:
shape: (10, 2)
┌─────────────────────┬───────┐
│ timestamp ┆ Close │
│ --- ┆ --- │
│ datetime[μs] ┆ i64 │
╞═════════════════════╪═══════╡
│ 2023-06-01 09:15:00 ┆ 101 │
│ 2023-06-01 09:30:00 ┆ 102 │
│ 2023-06-01 09:45:00 ┆ 103 │
│ 2023-06-01 10:00:00 ┆ 104 │
│ 2023-06-01 10:15:00 ┆ 105 │
│ 2023-06-02 09:15:00 ┆ 111 │
│ 2023-06-02 09:30:00 ┆ 112 │
│ 2023-06-02 09:45:00 ┆ 113 │
│ 2023-06-02 10:00:00 ┆ 114 │
│ 2023-06-02 10:15:00 ┆ 115 │
└─────────────────────┴───────┘
Max Values with Date and Time:
shape: (10, 3)
┌─────────────────────┬────────┬─────────────────────┐
│ date ┆ fbarh3 ┆ timestamp │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ i64 ┆ datetime[μs] │
╞═════════════════════╪════════╪═════════════════════╡
│ 2023-06-01 00:00:00 ┆ 101 ┆ 2023-06-01 09:15:00 │
│ 2023-06-01 00:00:00 ┆ 102 ┆ 2023-06-01 09:30:00 │
│ 2023-06-01 00:00:00 ┆ 103 ┆ 2023-06-01 09:45:00 │
│ 2023-06-01 00:00:00 ┆ 104 ┆ 2023-06-01 10:00:00 │
│ 2023-06-01 00:00:00 ┆ 105 ┆ 2023-06-01 10:15:00 │
│ 2023-06-02 00:00:00 ┆ 111 ┆ 2023-06-02 09:15:00 │
│ 2023-06-02 00:00:00 ┆ 112 ┆ 2023-06-02 09:30:00 │
│ 2023-06-02 00:00:00 ┆ 113 ┆ 2023-06-02 09:45:00 │
│ 2023-06-02 00:00:00 ┆ 114 ┆ 2023-06-02 10:00:00 │
│ 2023-06-02 00:00:00 ┆ 115 ┆ 2023-06-02 10:15:00 │
└─────────────────────┴────────┴─────────────────────┘
Original DataFrame with Date and Time:
shape: (20, 4)
┌─────────────────────┬───────┬─────────────────────┬──────────┐
│ timestamp ┆ Close ┆ date ┆ time │
│ --- ┆ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ i64 ┆ datetime[μs] ┆ time │
╞═════════════════════╪═══════╪═════════════════════╪══════════╡
│ 2023-06-01 09:00:00 ┆ 100 ┆ 2023-06-01 00:00:00 ┆ 09:00:00 │
│ 2023-06-01 09:15:00 ┆ 101 ┆ 2023-06-01 00:00:00 ┆ 09:15:00 │
│ 2023-06-01 09:30:00 ┆ 102 ┆ 2023-06-01 00:00:00 ┆ 09:30:00 │
│ 2023-06-01 09:45:00 ┆ 103 ┆ 2023-06-01 00:00:00 ┆ 09:45:00 │
│ 2023-06-01 10:00:00 ┆ 104 ┆ 2023-06-01 00:00:00 ┆ 10:00:00 │
│ … ┆ … ┆ … ┆ … │
│ 2023-06-02 10:15:00 ┆ 115 ┆ 2023-06-02 00:00:00 ┆ 10:15:00 │
│ 2023-06-02 10:30:00 ┆ 116 ┆ 2023-06-02 00:00:00 ┆ 10:30:00 │
│ 2023-06-02 10:45:00 ┆ 117 ┆ 2023-06-02 00:00:00 ┆ 10:45:00 │
│ 2023-06-02 11:00:00 ┆ 118 ┆ 2023-06-02 00:00:00 ┆ 11:00:00 │
│ 2023-06-02 11:15:00 ┆ 119 ┆ 2023-06-02 00:00:00 ┆ 11:15:00 │
└─────────────────────┴───────┴─────────────────────┴──────────┘
Result DataFrame:
shape: (20, 3)
┌─────────────────────┬───────┬────────┐
│ timestamp ┆ Close ┆ fbarh3 │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ i64 ┆ i64 │
╞═════════════════════╪═══════╪════════╡
│ 2023-06-01 09:00:00 ┆ 100 ┆ null │
│ 2023-06-01 09:15:00 ┆ 101 ┆ 101 │
│ 2023-06-01 09:30:00 ┆ 102 ┆ 102 │
│ 2023-06-01 09:45:00 ┆ 103 ┆ 103 │
│ 2023-06-01 10:00:00 ┆ 104 ┆ 104 │
│ … ┆ … ┆ … │
│ 2023-06-02 10:15:00 ┆ 115 ┆ 115 │
│ 2023-06-02 10:30:00 ┆ 116 ┆ null │
│ 2023-06-02 10:45:00 ┆ 117 ┆ null │
│ 2023-06-02 11:00:00 ┆ 118 ┆ null │
│ 2023-06-02 11:15:00 ┆ 119 ┆ null │
└─────────────────────┴───────┴────────┘