在group_by_dynamic中向前看

Question

当对索引列执行 group_by_dynamic 并将列聚合到列表中（以查看组中包含哪些值）时，对于给定组和给定索引 i，列表中的值对应于正确的值组，但对于索引 i, i+1, ... 直到句点。索引中的这种前瞻性似乎与 moving_mean 函数形成对比，后者（默认情况下）使用较低索引的值（而不是较高索引的值）来执行滚动平均值。

它是故意这样设计的吗？如果是这样，如何使用较低的索引执行 group_by_dynamic ？（我不确定 offset 参数的作用，但不是我想要的）

这是一个我不希望提出的例子

import polars as pl

df = pl.DataFrame(
    {
        "index": [0, 0, 1, 1],
        "group": ["banana", "pear", "banana", "pear"],
        "weight": [2, 3, 5, 7],
    }
)

agg = df.group_by_dynamic("index", group_by="group", every="1i", period="2i").agg(pl.col("weight"))

assert((
    agg
    .filter(index=0, group="banana")
    .select("weight")
    .to_series()
    .to_list()
) == [[2]])

谢谢你

Answer 1

如上所述，解决方案可以是

rolling

。

以你的例子：

df.group_by_dynamic("index", group_by="group", every="1i", period="2i").agg(pl.col("weight"))

shape: (4, 3)
┌────────┬───────┬───────────┐
│ group  ┆ index ┆ weight    │
│ ---    ┆ ---   ┆ ---       │
│ str    ┆ i64   ┆ list[i64] │
╞════════╪═══════╪═══════════╡
│ banana ┆ 0     ┆ [2, 5]    │
│ banana ┆ 1     ┆ [5]       │
│ pear   ┆ 0     ┆ [3, 7]    │
│ pear   ┆ 1     ┆ [7]       │
└────────┴───────┴───────────┘

滚动：

df.rolling(index_column="index", period="2i", group_by="group").agg(pl.col("weight"))

shape: (4, 3)
┌────────┬───────┬───────────┐
│ group  ┆ index ┆ weight    │
│ ---    ┆ ---   ┆ ---       │
│ str    ┆ i64   ┆ list[i64] │
╞════════╪═══════╪═══════════╡
│ banana ┆ 0     ┆ [2]       │
│ banana ┆ 1     ┆ [2, 5]    │
│ pear   ┆ 0     ┆ [3]       │
│ pear   ┆ 1     ┆ [3, 7]    │
└────────┴───────┴───────────┘

在group_by_dynamic中向前看

问题描述投票：0回答：1

1个回答

最新问题

在group_by_dynamic中向前看

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1