Polars 滚动仅保留索引和按列

问题描述 投票:0回答:1

使用与上一个问题相同的数据框,

df = pl.DataFrame(
    [
        pl.Series("Time", ['02/01/2018 07:05', '02/01/2018 07:07', '02/01/2018 07:08', '02/01/2018 07:09', '02/01/2018 07:10', '02/01/2018 07:12', '02/01/2018 07:13', '02/01/2018 07:14', '02/01/2018 07:18', '02/01/2018 07:26', '02/01/2018 07:38', '02/01/2018 07:39', '02/01/2018 07:45', '02/01/2018 07:48', '02/01/2018 07:49', '02/01/2018 07:50', '02/01/2018 07:52', '02/01/2018 07:53', '02/01/2018 07:56', '02/01/2018 07:57'], dtype=pl.String),
        pl.Series("Open", [8.05, 8.04, 8.02, 8.02, 8.02, 8.01, 8.01, 8.0, 7.99, 7.99, 7.99, 7.99, 7.99, 7.98, 7.96, 7.96, 7.95, 7.94, 7.94, 7.93], dtype=pl.Float64),
        pl.Series("High", [8.05, 8.04, 8.02, 8.02, 8.02, 8.01, 8.01, 8.0, 7.99, 7.99, 7.99, 7.99, 7.99, 7.98, 7.96, 7.96, 7.95, 7.94, 7.94, 7.93], dtype=pl.Float64),
        pl.Series("Low", [8.05, 8.01, 8.01, 8.02, 8.02, 8.01, 8.01, 8.0, 7.99, 7.99, 7.98, 7.99, 7.99, 7.98, 7.94, 7.96, 7.94, 7.94, 7.94, 7.93], dtype=pl.Float64),
        pl.Series("Close", [8.05, 8.01, 8.02, 8.02, 8.02, 8.01, 8.01, 8.0, 7.99, 7.99, 7.98, 7.99, 7.99, 7.98, 7.94, 7.96, 7.94, 7.94, 7.94, 7.93], dtype=pl.Float64),
        pl.Series("MA14", [8.13, 8.12, 8.11, 8.11, 8.1, 8.09, 8.08, 8.07, 8.06, 8.05, 8.04, 8.03, 8.02, 8.0, 8.0, 7.99, 7.99, 7.98, 7.98, 7.97], dtype=pl.Float64),
        pl.Series("MA28", [8.1, 8.09, 8.09, 8.09, 8.09, 8.09, 8.09, 8.09, 8.09, 8.08, 8.08, 8.08, 8.07, 8.07, 8.06, 8.06, 8.05, 8.04, 8.04, 8.03], dtype=pl.Float64),
        pl.Series("PVT", [0.0, -0.3478, -0.2904, -0.2904, -0.2904, -0.3527, -0.3527, -0.3677, -0.374, -0.374, -0.404, -0.3376, -0.3376, -0.3489, -0.4792, -0.459, -0.6224, -0.6224, -0.6224, -0.6362], dtype=pl.Float64),
    ]
)

并使用之前的答案添加更多列,

ohlc = df.with_columns(
        DateTime = pl.col("Time").str.to_datetime()
    ).with_columns(
        Date = pl.col("DateTime").dt.date().set_sorted(), 
        t = pl.col("DateTime").dt.time(),
        ones = pl.lit(1),
        vol = (2*(pl.col("High")-pl.col("Low"))/(pl.col("Open")+pl.col("Close"))).round(5),   
    ).with_columns(
        MA = pl.col("Close").rolling_mean(20).over("Date"),
        n = pl.col("ones").cum_sum().over("Date")
    ).select(pl.exclude("ones"))

我想尝试使用

rolling
方法创建前滚列表:

pl.Config(fmt_str_lengths=100) # increase repr

out = ohlc.rolling(
        index_column = 'n',
        period = '20i',
        offset = '0i',
        group_by = "Date",
    ).agg(
        pl.format("[{}]", pl.col("Close").str.concat(",")).alias("lists")
    )

问题是

print(out.head())
是,

shape: (5, 3)
┌────────────┬─────┬─────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Date       ┆ n   ┆ lists                                                                                           │
│ ---        ┆ --- ┆ ---                                                                                             │
│ date       ┆ i32 ┆ str                                                                                             │
╞════════════╪═════╪═════════════════════════════════════════════════════════════════════════════════════════════════╡
│ 2018-01-02 ┆ 1   ┆ [8.01,8.02,8.02,8.02,8.01,8.01,8.0,7.99,7.99,7.98,7.99,7.99,7.98,7.94,7.96,7.94,7.94,7.94,7.93] │
│ 2018-01-02 ┆ 2   ┆ [8.02,8.02,8.02,8.01,8.01,8.0,7.99,7.99,7.98,7.99,7.99,7.98,7.94,7.96,7.94,7.94,7.94,7.93]      │
│ 2018-01-02 ┆ 3   ┆ [8.02,8.02,8.01,8.01,8.0,7.99,7.99,7.98,7.99,7.99,7.98,7.94,7.96,7.94,7.94,7.94,7.93]           │
│ 2018-01-02 ┆ 4   ┆ [8.02,8.01,8.01,8.0,7.99,7.99,7.98,7.99,7.99,7.98,7.94,7.96,7.94,7.94,7.94,7.93]                │
│ 2018-01-02 ┆ 5   ┆ [8.01,8.01,8.0,7.99,7.99,7.98,7.99,7.99,7.98,7.94,7.96,7.94,7.94,7.94,7.93]                     │
└────────────┴─────┴─────────────────────────────────────────────────────────────────────────────────────────────────┘

所以大部分原来的列都被删除了。有没有办法留住他们?理想情况下,输出应该是这样的,

shape: (5, 15)
┌──────────────────┬──────┬──────┬───┬──────┬─────┬─────────────────────────────────┐
│ Time             ┆ Open ┆ High ┆ … ┆ MA   ┆ n   ┆ lists                           │
│ ---              ┆ ---  ┆ ---  ┆   ┆ ---  ┆ --- ┆ ---                             │
│ str              ┆ f64  ┆ f64  ┆   ┆ f64  ┆ i32 ┆ str                             │
╞══════════════════╪══════╪══════╪═══╪══════╪═════╪═════════════════════════════════╡
│ 02/01/2018 07:05 ┆ 8.05 ┆ 8.05 ┆ … ┆ null ┆ 1   ┆ [8.01,8.02,8.02,8.02,8.01,8.01… │
│ 02/01/2018 07:07 ┆ 8.04 ┆ 8.04 ┆ … ┆ null ┆ 2   ┆ [8.02,8.02,8.02,8.01,8.01,8.0,… │
│ 02/01/2018 07:08 ┆ 8.02 ┆ 8.02 ┆ … ┆ null ┆ 3   ┆ [8.02,8.02,8.01,8.01,8.0,7.99,… │
│ 02/01/2018 07:09 ┆ 8.02 ┆ 8.02 ┆ … ┆ null ┆ 4   ┆ [8.02,8.01,8.01,8.0,7.99,7.99,… │
│ 02/01/2018 07:10 ┆ 8.02 ┆ 8.02 ┆ … ┆ null ┆ 5   ┆ [8.01,8.01,8.0,7.99,7.99,7.98,… │
└──────────────────┴──────┴──────┴───┴──────┴─────┴─────────────────────────────────┘
python group-by aggregate python-polars
1个回答
0
投票

任何

group_by
/
agg
的结果都将是您告诉它您关心的列,仅此而已。

尚不清楚您希望以何种形式保留分组时的原始列,但这是最通用的方式,其中每个列都将拥有其组的所有原始值的列表。

ohlc.group_by_rolling(
        index_column = 'n',
        period = '20i',
        offset = '0i',
        by = "Date",
    ).agg(
        pl.col("Close").map_elements(lambda col: list_to_string(col.to_list())).alias("lists"),
        pl.exclude('n','Date')
    )
shape: (20, 15)
┌────────────┬─────┬────────────┬────────────┬───┬────────────┬────────────┬───────────┬───────────┐
│ Date       ┆ n   ┆ lists      ┆ Time       ┆ … ┆ DateTime   ┆ t          ┆ vol       ┆ MA        │
│ ---        ┆ --- ┆ ---        ┆ ---        ┆   ┆ ---        ┆ ---        ┆ ---       ┆ ---       │
│ date       ┆ i32 ┆ str        ┆ list[str]  ┆   ┆ list[datet ┆ list[time] ┆ list[f64] ┆ list[f64] │
│            ┆     ┆            ┆            ┆   ┆ ime[μs]]   ┆            ┆           ┆           │
╞════════════╪═════╪════════════╪════════════╪═══╪════════════╪════════════╪═══════════╪═══════════╡
│ 2018-01-02 ┆ 1   ┆ [8.01,8.02 ┆ ["02/01/20 ┆ … ┆ [2018-01-0 ┆ [07:07:00, ┆ [0.00374, ┆ [null,    │
│            ┆     ┆ ,8.02,8.02 ┆ 18 07:07", ┆   ┆ 2          ┆ 07:08:00,  ┆ 0.00125,  ┆ null, …   │
│            ┆     ┆ ,8.01,8.01 ┆ "02/01/201 ┆   ┆ 07:07:00,  ┆ …          ┆ … 0.0]    ┆ 7.9855]   │
│            ┆     ┆ ,8…        ┆ 8…         ┆   ┆ 2018-01-02 ┆ 07:57:00]  ┆           ┆           │
│            ┆     ┆            ┆            ┆   ┆ …          ┆            ┆           ┆           │
│ 2018-01-02 ┆ 2   ┆ [8.02,8.02 ┆ ["02/01/20 ┆ … ┆ [2018-01-0 ┆ [07:08:00, ┆ [0.00125, ┆ [null,    │
│            ┆     ┆ ,8.02,8.01 ┆ 18 07:08", ┆   ┆ 2          ┆ 07:09:00,  ┆ 0.0, …    ┆ null, …   │
│            ┆     ┆ ,8.01,8.0, ┆ "02/01/201 ┆   ┆ 07:08:00,  ┆ …          ┆ 0.0]      ┆ 7.9855]   │
│            ┆     ┆ 7.…        ┆ 8…         ┆   ┆ 2018-01-02 ┆ 07:57:00]  ┆           ┆           │
│            ┆     ┆            ┆            ┆   ┆ …          ┆            ┆           ┆           │
│ 2018-01-02 ┆ 3   ┆ [8.02,8.02 ┆ ["02/01/20 ┆ … ┆ [2018-01-0 ┆ [07:09:00, ┆ [0.0,     ┆ [null,    │
│            ┆     ┆ ,8.01,8.01 ┆ 18 07:09", ┆   ┆ 2          ┆ 07:10:00,  ┆ 0.0, …    ┆ null, …   │
│            ┆     ┆ ,8.0,7.99, ┆ "02/01/201 ┆   ┆ 07:09:00,  ┆ …          ┆ 0.0]      ┆ 7.9855]   │
│            ┆     ┆ 7.…        ┆ 8…         ┆   ┆ 2018-01-02 ┆ 07:57:00]  ┆           ┆           │
│            ┆     ┆            ┆            ┆   ┆ …          ┆            ┆           ┆           │
│ 2018-01-02 ┆ 4   ┆ [8.02,8.01 ┆ ["02/01/20 ┆ … ┆ [2018-01-0 ┆ [07:10:00, ┆ [0.0,     ┆ [null,    │
│            ┆     ┆ ,8.01,8.0, ┆ 18 07:10", ┆   ┆ 2          ┆ 07:12:00,  ┆ 0.0, …    ┆ null, …   │
│            ┆     ┆ 7.99,7.99, ┆ "02/01/201 ┆   ┆ 07:10:00,  ┆ …          ┆ 0.0]      ┆ 7.9855]   │
...
│            ┆     ┆            ┆            ┆   ┆ 07:57:00]  ┆            ┆           ┆           │
│ -109782-01 ┆ 20  ┆ []         ┆ []         ┆ … ┆ []         ┆ []         ┆ []        ┆ []        │
│ -05        ┆     ┆            ┆            ┆   ┆            ┆            ┆           ┆           │
└────────────┴─────┴────────────┴────────────┴───┴────────────┴────────────┴───────────┴───────────┘

其中

pl.exclude
表示除您放在那里的所有列之外的所有列。这是
pl.all().exclude()
的快捷方式,更直观,但打字更多。由于您已经在
group_by
中请求“n”和“日期”,因此您不想再次使用
pl.all()
请求它们。

根据编辑,您应该执行自连接来恢复列。另外,UDF 可以用表达式复制,所以也在这里。

ohlc.join(ohlc.group_by_rolling(
        index_column = 'n',
        period = '20i',
        offset = '0i',
        by = "Date",
    ).agg(
        pl.col("Close").cast(pl.Utf8()).str.concat(',').alias('lists')
    ).with_columns(
        (pl.lit("[") + pl.col('lists') + pl.lit("]")).alias('lists')
    ), on=['Date','n']
)
shape: (19, 15)
┌──────────────────┬──────┬──────┬──────┬───┬─────────┬──────┬─────┬───────────────────────────────┐
│ Time             ┆ Open ┆ High ┆ Low  ┆ … ┆ vol     ┆ MA   ┆ n   ┆ lists                         │
│ ---              ┆ ---  ┆ ---  ┆ ---  ┆   ┆ ---     ┆ ---  ┆ --- ┆ ---                           │
│ str              ┆ f64  ┆ f64  ┆ f64  ┆   ┆ f64     ┆ f64  ┆ i32 ┆ str                           │
╞══════════════════╪══════╪══════╪══════╪═══╪═════════╪══════╪═════╪═══════════════════════════════╡
│ 02/01/2018 07:05 ┆ 8.05 ┆ 8.05 ┆ 8.05 ┆ … ┆ 0.0     ┆ null ┆ 1   ┆ [8.01,8.02,8.02,8.02,8.01,8.0 │
│                  ┆      ┆      ┆      ┆   ┆         ┆      ┆     ┆ 1,8…                          │
│ 02/01/2018 07:07 ┆ 8.04 ┆ 8.04 ┆ 8.01 ┆ … ┆ 0.00374 ┆ null ┆ 2   ┆ [8.02,8.02,8.02,8.01,8.01,8.0 │
│                  ┆      ┆      ┆      ┆   ┆         ┆      ┆     ┆ ,7.…                          │
│ 02/01/2018 07:08 ┆ 8.02 ┆ 8.02 ┆ 8.01 ┆ … ┆ 0.00125 ┆ null ┆ 3   ┆ [8.02,8.02,8.01,8.01,8.0,7.99 │
│                  ┆      ┆      ┆      ┆   ┆         ┆      ┆     ┆ ,7.…                          │
│ 02/01/2018 07:09 ┆ 8.02 ┆ 8.02 ┆ 8.02 ┆ … ┆ 0.0     ┆ null ┆ 4   ┆ [8.02,8.01,8.01,8.0,7.99,7.99 │
│                  ┆      ┆      ┆      ┆   ┆         ┆      ┆     ┆ ,7.…                          │
│ …                ┆ …    ┆ …    ┆ …    ┆ … ┆ …       ┆ …    ┆ …   ┆ …                             │
│ 02/01/2018 07:50 ┆ 7.96 ┆ 7.96 ┆ 7.96 ┆ … ┆ 0.0     ┆ null ┆ 16  ┆ [7.94,7.94,7.94,7.93]         │
│ 02/01/2018 07:52 ┆ 7.95 ┆ 7.95 ┆ 7.94 ┆ … ┆ 0.00126 ┆ null ┆ 17  ┆ [7.94,7.94,7.93]              │
│ 02/01/2018 07:53 ┆ 7.94 ┆ 7.94 ┆ 7.94 ┆ … ┆ 0.0     ┆ null ┆ 18  ┆ [7.94,7.93]                   │
│ 02/01/2018 07:56 ┆ 7.94 ┆ 7.94 ┆ 7.94 ┆ … ┆ 0.0     ┆ null ┆ 19  ┆ [7.93]                        │
└──────────────────┴──────┴──────┴──────┴───┴─────────┴──────┴─────┴───────────────────────────────┘
© www.soinside.com 2019 - 2024. All rights reserved.