基于极地每日分组的扩展功能

问题描述 投票:0回答:1

我在 Polars 数据框中有 OHLC 股票数据,每天我想计算两次之间的最大扩展收盘价,

start = '09:15'
end = '10:15'
start_time = datetime.time.fromisoformat(start)
end_time = datetime.time.fromisoformat(end)


 # Filter and calculate the expanding max within the time range for each day

max_values = und_df2.filter((und_df2['timestamp'].dt.time() >= start_time) & (und_df2['timestamp'].dt.time() <= end_time)) \
        .group_by_dynamic('timestamp', every='1d', closed='left') \
        .agg(pl.col('Close').cum_max().alias('fbarh3')) 


# df.with_columns(pl.col(date_col).dt.truncate("1d").alias("date")) \
#              .join(
#                  df.group_by_dynamic("timestamp", every="1d").agg(agg_func.alias(alias))
#                    .with_columns(pl.col(alias).shift(shift_value))
#                    ,
#                  left_on="date", right_on="timestamp", how='left'
#              ) \
#              .drop("date")


#Join the maximum values back to the original dataframe

und_df2 = und_df2.with_columns(pl.col('timestamp').dt.truncate('1d').alias('date')).join(max_values, left_on='date', right_on='timestamp' ,  how='left' ).drop('date')

max_values
虽然计算正确,它在开始时间和结束时间之间扩展了收盘价的最高值,但由于我将数据帧过滤到这些特定时间,所以我在聚合后丢失了时间分量。
如何让它回到原来的框架? (
und_df2
)

#== this is max_values generated 

shape: (824, 2)
timestamp   fbarh3
datetime[μs]    list[f64]
2020-03-02 00:00:00 [11356.8, 11358.45, … 11388.65]
2020-03-03 00:00:00 [11293.65, 11293.95, … 11310.7]
2020-03-04 00:00:00 [11296.95, 11296.95, … 11325.25]
2020-03-05 00:00:00 [11312.65, 11312.65, … 11312.65]
2020-03-06 00:00:00 [10894.8, 10914.85, … 10965.3]
每天从开始到结束

输出应该不断扩大

highest(cum_max)
。我似乎无法将
max_values
重新加入到
und_df2

python dataframe python-polars rust-polars
1个回答
0
投票

您的方法基本上是正确的,但您不需要

group_by_dynamic
。您可以通过这种方式执行此操作(此处为示例数据框。因此,将来在寻求帮助时,请发布示例数据)。在这里,我打印了所有步骤,以便您了解所做的事情:

import polars as pl
from datetime import time

data = {
    'timestamp': [
        '2023-06-01 09:00', '2023-06-01 09:15', '2023-06-01 09:30', '2023-06-01 09:45', '2023-06-01 10:00', '2023-06-01 10:15',
        '2023-06-01 10:30', '2023-06-01 10:45', '2023-06-01 11:00', '2023-06-01 11:15',
        '2023-06-02 09:00', '2023-06-02 09:15', '2023-06-02 09:30', '2023-06-02 09:45', '2023-06-02 10:00', '2023-06-02 10:15',
        '2023-06-02 10:30', '2023-06-02 10:45', '2023-06-02 11:00', '2023-06-02 11:15'
    ],
    'Close': [
        100, 101, 102, 103, 104, 105,
        106, 107, 108, 109,
        110, 111, 112, 113, 114, 115,
        116, 117, 118, 119
    ]
}

und_df2 = pl.DataFrame(data)
und_df2 = und_df2.with_columns(pl.col('timestamp').str.strptime(pl.Datetime, format='%Y-%m-%d %H:%M'))

start = '09:15'
end = '10:15'
start_time = time.fromisoformat(start)
end_time = time.fromisoformat(end)

und_df2 = und_df2.sort('timestamp')
filtered_df = und_df2.filter((und_df2['timestamp'].dt.time() >= start_time) & (und_df2['timestamp'].dt.time() <= end_time))

print("Filtered DataFrame:")
print(filtered_df)

filtered_df = filtered_df.with_columns(pl.col('timestamp').dt.truncate('1d').alias('date'))

max_values = filtered_df.group_by('date').agg(pl.col('Close').cum_max().alias('fbarh3'))
max_values = max_values.explode('fbarh3')
max_values = max_values.with_columns([
    filtered_df['timestamp']
])

print("Max Values with Date and Time:")
print(max_values)

und_df2 = und_df2.with_columns([
    pl.col('timestamp').dt.truncate('1d').alias('date'),
    pl.col('timestamp').dt.time().alias('time')
])

print("Original DataFrame with Date and Time:")
print(und_df2)

result_df = und_df2.join(max_values, on=['date', 'timestamp'], how='left')
result_df = result_df.drop(['date', 'time'])

print("Result DataFrame:")
print(result_df)

导致

Filtered DataFrame:
shape: (10, 2)
┌─────────────────────┬───────┐
│ timestamp           ┆ Close │
│ ---                 ┆ ---   │
│ datetime[μs]        ┆ i64   │
╞═════════════════════╪═══════╡
│ 2023-06-01 09:15:00 ┆ 101   │
│ 2023-06-01 09:30:00 ┆ 102   │
│ 2023-06-01 09:45:00 ┆ 103   │
│ 2023-06-01 10:00:00 ┆ 104   │
│ 2023-06-01 10:15:00 ┆ 105   │
│ 2023-06-02 09:15:00 ┆ 111   │
│ 2023-06-02 09:30:00 ┆ 112   │
│ 2023-06-02 09:45:00 ┆ 113   │
│ 2023-06-02 10:00:00 ┆ 114   │
│ 2023-06-02 10:15:00 ┆ 115   │
└─────────────────────┴───────┘
Max Values with Date and Time:
shape: (10, 3)
┌─────────────────────┬────────┬─────────────────────┐
│ date                ┆ fbarh3 ┆ timestamp           │
│ ---                 ┆ ---    ┆ ---                 │
│ datetime[μs]        ┆ i64    ┆ datetime[μs]        │
╞═════════════════════╪════════╪═════════════════════╡
│ 2023-06-01 00:00:00 ┆ 101    ┆ 2023-06-01 09:15:00 │
│ 2023-06-01 00:00:00 ┆ 102    ┆ 2023-06-01 09:30:00 │
│ 2023-06-01 00:00:00 ┆ 103    ┆ 2023-06-01 09:45:00 │
│ 2023-06-01 00:00:00 ┆ 104    ┆ 2023-06-01 10:00:00 │
│ 2023-06-01 00:00:00 ┆ 105    ┆ 2023-06-01 10:15:00 │
│ 2023-06-02 00:00:00 ┆ 111    ┆ 2023-06-02 09:15:00 │
│ 2023-06-02 00:00:00 ┆ 112    ┆ 2023-06-02 09:30:00 │
│ 2023-06-02 00:00:00 ┆ 113    ┆ 2023-06-02 09:45:00 │
│ 2023-06-02 00:00:00 ┆ 114    ┆ 2023-06-02 10:00:00 │
│ 2023-06-02 00:00:00 ┆ 115    ┆ 2023-06-02 10:15:00 │
└─────────────────────┴────────┴─────────────────────┘
Original DataFrame with Date and Time:
shape: (20, 4)
┌─────────────────────┬───────┬─────────────────────┬──────────┐
│ timestamp           ┆ Close ┆ date                ┆ time     │
│ ---                 ┆ ---   ┆ ---                 ┆ ---      │
│ datetime[μs]        ┆ i64   ┆ datetime[μs]        ┆ time     │
╞═════════════════════╪═══════╪═════════════════════╪══════════╡
│ 2023-06-01 09:00:00 ┆ 100   ┆ 2023-06-01 00:00:00 ┆ 09:00:00 │
│ 2023-06-01 09:15:00 ┆ 101   ┆ 2023-06-01 00:00:00 ┆ 09:15:00 │
│ 2023-06-01 09:30:00 ┆ 102   ┆ 2023-06-01 00:00:00 ┆ 09:30:00 │
│ 2023-06-01 09:45:00 ┆ 103   ┆ 2023-06-01 00:00:00 ┆ 09:45:00 │
│ 2023-06-01 10:00:00 ┆ 104   ┆ 2023-06-01 00:00:00 ┆ 10:00:00 │
│ …                   ┆ …     ┆ …                   ┆ …        │
│ 2023-06-02 10:15:00 ┆ 115   ┆ 2023-06-02 00:00:00 ┆ 10:15:00 │
│ 2023-06-02 10:30:00 ┆ 116   ┆ 2023-06-02 00:00:00 ┆ 10:30:00 │
│ 2023-06-02 10:45:00 ┆ 117   ┆ 2023-06-02 00:00:00 ┆ 10:45:00 │
│ 2023-06-02 11:00:00 ┆ 118   ┆ 2023-06-02 00:00:00 ┆ 11:00:00 │
│ 2023-06-02 11:15:00 ┆ 119   ┆ 2023-06-02 00:00:00 ┆ 11:15:00 │
└─────────────────────┴───────┴─────────────────────┴──────────┘
Result DataFrame:
shape: (20, 3)
┌─────────────────────┬───────┬────────┐
│ timestamp           ┆ Close ┆ fbarh3 │
│ ---                 ┆ ---   ┆ ---    │
│ datetime[μs]        ┆ i64   ┆ i64    │
╞═════════════════════╪═══════╪════════╡
│ 2023-06-01 09:00:00 ┆ 100   ┆ null   │
│ 2023-06-01 09:15:00 ┆ 101   ┆ 101    │
│ 2023-06-01 09:30:00 ┆ 102   ┆ 102    │
│ 2023-06-01 09:45:00 ┆ 103   ┆ 103    │
│ 2023-06-01 10:00:00 ┆ 104   ┆ 104    │
│ …                   ┆ …     ┆ …      │
│ 2023-06-02 10:15:00 ┆ 115   ┆ 115    │
│ 2023-06-02 10:30:00 ┆ 116   ┆ null   │
│ 2023-06-02 10:45:00 ┆ 117   ┆ null   │
│ 2023-06-02 11:00:00 ┆ 118   ┆ null   │
│ 2023-06-02 11:15:00 ┆ 119   ┆ null   │
└─────────────────────┴───────┴────────┘

© www.soinside.com 2019 - 2024. All rights reserved.