基于极地每日分组的扩展功能

问题描述 投票:0回答:1

我在 Polars 数据框中有 OHLC 股票数据,每天我想计算两次之间的最大扩展收盘价,

start = '09:15'
end = '10:15'
start_time = datetime.time.fromisoformat(start)
end_time = datetime.time.fromisoformat(end)


 # Filter and calculate the expanding max within the time range for each day

max_values = und_df2.filter((und_df2['timestamp'].dt.time() >= start_time) & (und_df2['timestamp'].dt.time() <= end_time)) \
        .group_by_dynamic('timestamp', every='1d', closed='left') \
        .agg(pl.col('Close').cum_max().alias('fbarh3')) 


# df.with_columns(pl.col(date_col).dt.truncate("1d").alias("date")) \
#              .join(
#                  df.group_by_dynamic("timestamp", every="1d").agg(agg_func.alias(alias))
#                    .with_columns(pl.col(alias).shift(shift_value))
#                    ,
#                  left_on="date", right_on="timestamp", how='left'
#              ) \
#              .drop("date")


#Join the maximum values back to the original dataframe

und_df2 = und_df2.with_columns(pl.col('timestamp').dt.truncate('1d').alias('date')).join(max_values, left_on='date', right_on='timestamp' ,  how='left' ).drop('date')

max_values
虽然计算正确,它在开始时间和结束时间之间扩展了收盘价的最高值,但由于我将数据帧过滤到这些特定时间,所以我在聚合后丢失了时间分量。
如何让它回到原来的框架? (
und_df2
)

#== this is max_values generated 

shape: (824, 2)
timestamp   fbarh3
datetime[μs]    list[f64]
2020-03-02 00:00:00 [11356.8, 11358.45,  11388.65]
2020-03-03 00:00:00 [11293.65, 11293.95,  11310.7]
2020-03-04 00:00:00 [11296.95, 11296.95,  11325.25]
2020-03-05 00:00:00 [11312.65, 11312.65,  11312.65]
2020-03-06 00:00:00 [10894.8, 10914.85,  10965.3]
每天从开始到结束

输出应该不断扩大

highest(cum_max)
。我似乎无法将
max_values
重新加入到
und_df2

python dataframe python-polars rust-polars
1个回答
0
投票

您的方法基本上是正确的,但您不需要

group_by_dynamic
。您可以通过这种方式执行此操作(此处为示例数据框。因此,将来在寻求帮助时,请发布示例数据)。在这里,我打印了所有步骤,以便您了解所做的事情:

import polars as pl
from datetime import time

data = {
    'timestamp': [
        '2023-06-01 09:00', '2023-06-01 09:15', '2023-06-01 09:30', '2023-06-01 09:45', '2023-06-01 10:00', '2023-06-01 10:15',
        '2023-06-01 10:30', '2023-06-01 10:45', '2023-06-01 11:00', '2023-06-01 11:15',
        '2023-06-02 09:00', '2023-06-02 09:15', '2023-06-02 09:30', '2023-06-02 09:45', '2023-06-02 10:00', '2023-06-02 10:15',
        '2023-06-02 10:30', '2023-06-02 10:45', '2023-06-02 11:00', '2023-06-02 11:15'
    ],
    'Close': [
        100, 101, 102, 103, 104, 105,
        106, 107, 108, 109,
        110, 111, 112, 113, 114, 115,
        116, 117, 118, 119
    ]
}

und_df2 = pl.DataFrame(data)
und_df2 = und_df2.with_columns(pl.col('timestamp').str.strptime(pl.Datetime, format='%Y-%m-%d %H:%M'))

start = '09:15'
end = '10:15'
start_time = time.fromisoformat(start)
end_time = time.fromisoformat(end)

und_df2 = und_df2.sort('timestamp')
filtered_df = und_df2.filter((und_df2['timestamp'].dt.time() >= start_time) & (und_df2['timestamp'].dt.time() <= end_time))

print("Filtered DataFrame:")
print(filtered_df)

filtered_df = filtered_df.with_columns(pl.col('timestamp').dt.truncate('1d').alias('date'))

max_values = filtered_df.group_by('date').agg(pl.col('Close').cum_max().alias('fbarh3'))
max_values = max_values.explode('fbarh3')
max_values = max_values.with_columns([
    filtered_df['timestamp']
])

print("Max Values with Date and Time:")
print(max_values)

und_df2 = und_df2.with_columns([
    pl.col('timestamp').dt.truncate('1d').alias('date'),
    pl.col('timestamp').dt.time().alias('time')
])

print("Original DataFrame with Date and Time:")
print(und_df2)

result_df = und_df2.join(max_values, on=['date', 'timestamp'], how='left')
result_df = result_df.drop(['date', 'time'])

print("Result DataFrame:")
print(result_df)

导致

Filtered DataFrame:
shape: (10, 2)
┌─────────────────────┬───────┐
 timestamp            Close 
 ---                  ---   
 datetime[μs]         i64   
╞═════════════════════╪═══════╡
 2023-06-01 09:15:00  101   
 2023-06-01 09:30:00  102   
 2023-06-01 09:45:00  103   
 2023-06-01 10:00:00  104   
 2023-06-01 10:15:00  105   
 2023-06-02 09:15:00  111   
 2023-06-02 09:30:00  112   
 2023-06-02 09:45:00  113   
 2023-06-02 10:00:00  114   
 2023-06-02 10:15:00  115   
└─────────────────────┴───────┘
Max Values with Date and Time:
shape: (10, 3)
┌─────────────────────┬────────┬─────────────────────┐
 date                 fbarh3  timestamp           
 ---                  ---     ---                 
 datetime[μs]         i64     datetime[μs]        
╞═════════════════════╪════════╪═════════════════════╡
 2023-06-01 00:00:00  101     2023-06-01 09:15:00 
 2023-06-01 00:00:00  102     2023-06-01 09:30:00 
 2023-06-01 00:00:00  103     2023-06-01 09:45:00 
 2023-06-01 00:00:00  104     2023-06-01 10:00:00 
 2023-06-01 00:00:00  105     2023-06-01 10:15:00 
 2023-06-02 00:00:00  111     2023-06-02 09:15:00 
 2023-06-02 00:00:00  112     2023-06-02 09:30:00 
 2023-06-02 00:00:00  113     2023-06-02 09:45:00 
 2023-06-02 00:00:00  114     2023-06-02 10:00:00 
 2023-06-02 00:00:00  115     2023-06-02 10:15:00 
└─────────────────────┴────────┴─────────────────────┘
Original DataFrame with Date and Time:
shape: (20, 4)
┌─────────────────────┬───────┬─────────────────────┬──────────┐
 timestamp            Close  date                 time     
 ---                  ---    ---                  ---      
 datetime[μs]         i64    datetime[μs]         time     
╞═════════════════════╪═══════╪═════════════════════╪══════════╡
 2023-06-01 09:00:00  100    2023-06-01 00:00:00  09:00:00 
 2023-06-01 09:15:00  101    2023-06-01 00:00:00  09:15:00 
 2023-06-01 09:30:00  102    2023-06-01 00:00:00  09:30:00 
 2023-06-01 09:45:00  103    2023-06-01 00:00:00  09:45:00 
 2023-06-01 10:00:00  104    2023-06-01 00:00:00  10:00:00 
                                                       
 2023-06-02 10:15:00  115    2023-06-02 00:00:00  10:15:00 
 2023-06-02 10:30:00  116    2023-06-02 00:00:00  10:30:00 
 2023-06-02 10:45:00  117    2023-06-02 00:00:00  10:45:00 
 2023-06-02 11:00:00  118    2023-06-02 00:00:00  11:00:00 
 2023-06-02 11:15:00  119    2023-06-02 00:00:00  11:15:00 
└─────────────────────┴───────┴─────────────────────┴──────────┘
Result DataFrame:
shape: (20, 3)
┌─────────────────────┬───────┬────────┐
 timestamp            Close  fbarh3 
 ---                  ---    ---    
 datetime[μs]         i64    i64    
╞═════════════════════╪═══════╪════════╡
 2023-06-01 09:00:00  100    null   
 2023-06-01 09:15:00  101    101    
 2023-06-01 09:30:00  102    102    
 2023-06-01 09:45:00  103    103    
 2023-06-01 10:00:00  104    104    
                                 
 2023-06-02 10:15:00  115    115    
 2023-06-02 10:30:00  116    null   
 2023-06-02 10:45:00  117    null   
 2023-06-02 11:00:00  118    null   
 2023-06-02 11:15:00  119    null   
└─────────────────────┴───────┴────────┘

最新问题
© www.soinside.com 2019 - 2025. All rights reserved.