Polars group_by_rolling 仅保留索引和按列

问题描述 投票:0回答:1

使用与上一个问题相同的数据框,

pl.DataFrame(
    [
        pl.Series("Time", ['02/01/2018 07:05', '02/01/2018 07:07', '02/01/2018 07:08', '02/01/2018 07:09', '02/01/2018 07:10', '02/01/2018 07:12', '02/01/2018 07:13', '02/01/2018 07:14', '02/01/2018 07:18', '02/01/2018 07:26', '02/01/2018 07:38', '02/01/2018 07:39', '02/01/2018 07:45', '02/01/2018 07:48', '02/01/2018 07:49', '02/01/2018 07:50', '02/01/2018 07:52', '02/01/2018 07:53', '02/01/2018 07:56', '02/01/2018 07:57'], dtype=pl.Utf8),
        pl.Series("Open", [8.05, 8.04, 8.02, 8.02, 8.02, 8.01, 8.01, 8.0, 7.99, 7.99, 7.99, 7.99, 7.99, 7.98, 7.96, 7.96, 7.95, 7.94, 7.94, 7.93], dtype=pl.Float64),
        pl.Series("High", [8.05, 8.04, 8.02, 8.02, 8.02, 8.01, 8.01, 8.0, 7.99, 7.99, 7.99, 7.99, 7.99, 7.98, 7.96, 7.96, 7.95, 7.94, 7.94, 7.93], dtype=pl.Float64),
        pl.Series("Low", [8.05, 8.01, 8.01, 8.02, 8.02, 8.01, 8.01, 8.0, 7.99, 7.99, 7.98, 7.99, 7.99, 7.98, 7.94, 7.96, 7.94, 7.94, 7.94, 7.93], dtype=pl.Float64),
        pl.Series("Close", [8.05, 8.01, 8.02, 8.02, 8.02, 8.01, 8.01, 8.0, 7.99, 7.99, 7.98, 7.99, 7.99, 7.98, 7.94, 7.96, 7.94, 7.94, 7.94, 7.93], dtype=pl.Float64),
        pl.Series("MA14", [8.13, 8.12, 8.11, 8.11, 8.1, 8.09, 8.08, 8.07, 8.06, 8.05, 8.04, 8.03, 8.02, 8.0, 8.0, 7.99, 7.99, 7.98, 7.98, 7.97], dtype=pl.Float64),
        pl.Series("MA28", [8.1, 8.09, 8.09, 8.09, 8.09, 8.09, 8.09, 8.09, 8.09, 8.08, 8.08, 8.08, 8.07, 8.07, 8.06, 8.06, 8.05, 8.04, 8.04, 8.03], dtype=pl.Float64),
        pl.Series("PVT", [0.0, -0.3478, -0.2904, -0.2904, -0.2904, -0.3527, -0.3527, -0.3677, -0.374, -0.374, -0.404, -0.3376, -0.3376, -0.3489, -0.4792, -0.459, -0.6224, -0.6224, -0.6224, -0.6362], dtype=pl.Float64),
    ]
)

并使用之前的答案添加更多列,

ohlc = jan_data.with_columns(
        DateTime = pl.col("Time").str.to_datetime("%d/%m/%Y %H:%M")
    ).with_columns(
        Date = pl.col("DateTime").dt.date().set_sorted(), 
        t = pl.col("DateTime").dt.time(),
        ones = pl.lit(1),
        vol = (2*(pl.col("High")-pl.col("Low"))/(pl.col("Open")+pl.col("Close"))).round(5),   
    ).with_columns(
        MA = pl.col("Close").rolling_mean(20).over("Date"),
        n = pl.col("ones").cumsum().over("Date")
    ).select(pl.exclude("ones"))

我想尝试使用

group_by_rolling
方法创建前滚列表:

def list_to_string(a):
    t = '['
    t = t + ','.join(str(x) for x in a)
    t = t + ']'

    return t

out = ohlc.group_by_rolling(
        index_column = 'n',
        period = '20i',
        offset = '0i',
        by = "Date",
    ).agg(
        pl.col("Close").map_elements(lambda col: list_to_string(col.to_list())).alias("lists"),
    )

问题是

print(out.head(10))
是,

shape: (10, 3)
┌────────────┬─────┬───────────────────────────────────┐
│ Date       ┆ n   ┆ lists                             │
│ ---        ┆ --- ┆ ---                               │
│ date       ┆ i32 ┆ str                               │
╞════════════╪═════╪═══════════════════════════════════╡
│ 2018-01-02 ┆ 1   ┆ [8.01,8.02,8.02,8.02,8.01,8.01,8… │
│ 2018-01-02 ┆ 2   ┆ [8.02,8.02,8.02,8.01,8.01,8.0,7.… │
│ 2018-01-02 ┆ 3   ┆ [8.02,8.02,8.01,8.01,8.0,7.99,7.… │
│ 2018-01-02 ┆ 4   ┆ [8.02,8.01,8.01,8.0,7.99,7.99,7.… │
│ …          ┆ …   ┆ …                                 │
│ 2018-01-02 ┆ 7   ┆ [8.0,7.99,7.99,7.98,7.99,7.99,7.… │
│ 2018-01-02 ┆ 8   ┆ [7.99,7.99,7.98,7.99,7.99,7.98,7… │
│ 2018-01-02 ┆ 9   ┆ [7.99,7.98,7.99,7.99,7.98,7.94,7… │
│ 2018-01-02 ┆ 10  ┆ [7.98,7.99,7.99,7.98,7.94,7.96,7… │
└────────────┴─────┴───────────────────────────────────┘

所以大部分原来的列都被删除了。有没有办法留住他们?

group-by aggregate python-polars
1个回答
0
投票

任何

group_by
/
agg
的结果都将是您告诉它您关心的列,仅此而已。

尚不清楚您希望以何种形式保留分组时的原始列,但这是最通用的方式,其中每个列都将拥有其组的所有原始值的列表。

ohlc.group_by_rolling(
        index_column = 'n',
        period = '20i',
        offset = '0i',
        by = "Date",
    ).agg(
        pl.col("Close").map_elements(lambda col: list_to_string(col.to_list())).alias("lists"),
        pl.exclude('n','Date')
    )
shape: (20, 15)
┌────────────┬─────┬────────────┬────────────┬───┬────────────┬────────────┬───────────┬───────────┐
│ Date       ┆ n   ┆ lists      ┆ Time       ┆ … ┆ DateTime   ┆ t          ┆ vol       ┆ MA        │
│ ---        ┆ --- ┆ ---        ┆ ---        ┆   ┆ ---        ┆ ---        ┆ ---       ┆ ---       │
│ date       ┆ i32 ┆ str        ┆ list[str]  ┆   ┆ list[datet ┆ list[time] ┆ list[f64] ┆ list[f64] │
│            ┆     ┆            ┆            ┆   ┆ ime[μs]]   ┆            ┆           ┆           │
╞════════════╪═════╪════════════╪════════════╪═══╪════════════╪════════════╪═══════════╪═══════════╡
│ 2018-01-02 ┆ 1   ┆ [8.01,8.02 ┆ ["02/01/20 ┆ … ┆ [2018-01-0 ┆ [07:07:00, ┆ [0.00374, ┆ [null,    │
│            ┆     ┆ ,8.02,8.02 ┆ 18 07:07", ┆   ┆ 2          ┆ 07:08:00,  ┆ 0.00125,  ┆ null, …   │
│            ┆     ┆ ,8.01,8.01 ┆ "02/01/201 ┆   ┆ 07:07:00,  ┆ …          ┆ … 0.0]    ┆ 7.9855]   │
│            ┆     ┆ ,8…        ┆ 8…         ┆   ┆ 2018-01-02 ┆ 07:57:00]  ┆           ┆           │
│            ┆     ┆            ┆            ┆   ┆ …          ┆            ┆           ┆           │
│ 2018-01-02 ┆ 2   ┆ [8.02,8.02 ┆ ["02/01/20 ┆ … ┆ [2018-01-0 ┆ [07:08:00, ┆ [0.00125, ┆ [null,    │
│            ┆     ┆ ,8.02,8.01 ┆ 18 07:08", ┆   ┆ 2          ┆ 07:09:00,  ┆ 0.0, …    ┆ null, …   │
│            ┆     ┆ ,8.01,8.0, ┆ "02/01/201 ┆   ┆ 07:08:00,  ┆ …          ┆ 0.0]      ┆ 7.9855]   │
│            ┆     ┆ 7.…        ┆ 8…         ┆   ┆ 2018-01-02 ┆ 07:57:00]  ┆           ┆           │
│            ┆     ┆            ┆            ┆   ┆ …          ┆            ┆           ┆           │
│ 2018-01-02 ┆ 3   ┆ [8.02,8.02 ┆ ["02/01/20 ┆ … ┆ [2018-01-0 ┆ [07:09:00, ┆ [0.0,     ┆ [null,    │
│            ┆     ┆ ,8.01,8.01 ┆ 18 07:09", ┆   ┆ 2          ┆ 07:10:00,  ┆ 0.0, …    ┆ null, …   │
│            ┆     ┆ ,8.0,7.99, ┆ "02/01/201 ┆   ┆ 07:09:00,  ┆ …          ┆ 0.0]      ┆ 7.9855]   │
│            ┆     ┆ 7.…        ┆ 8…         ┆   ┆ 2018-01-02 ┆ 07:57:00]  ┆           ┆           │
│            ┆     ┆            ┆            ┆   ┆ …          ┆            ┆           ┆           │
│ 2018-01-02 ┆ 4   ┆ [8.02,8.01 ┆ ["02/01/20 ┆ … ┆ [2018-01-0 ┆ [07:10:00, ┆ [0.0,     ┆ [null,    │
│            ┆     ┆ ,8.01,8.0, ┆ 18 07:10", ┆   ┆ 2          ┆ 07:12:00,  ┆ 0.0, …    ┆ null, …   │
│            ┆     ┆ 7.99,7.99, ┆ "02/01/201 ┆   ┆ 07:10:00,  ┆ …          ┆ 0.0]      ┆ 7.9855]   │
...
│            ┆     ┆            ┆            ┆   ┆ 07:57:00]  ┆            ┆           ┆           │
│ -109782-01 ┆ 20  ┆ []         ┆ []         ┆ … ┆ []         ┆ []         ┆ []        ┆ []        │
│ -05        ┆     ┆            ┆            ┆   ┆            ┆            ┆           ┆           │
└────────────┴─────┴────────────┴────────────┴───┴────────────┴────────────┴───────────┴───────────┘

其中

pl.exclude
表示除您放在那里的所有列之外的所有列。这是
pl.all().exclude()
的快捷方式,更直观,但打字更多。由于您已经在
group_by
中请求“n”和“日期”,因此您不想再次使用
pl.all()
请求它们。

© www.soinside.com 2019 - 2024. All rights reserved.