如何在 Polars group_by 上下文中执行 if 和 else

Question

更新： 向量化规则现已正式化。查询按预期运行，没有警告。

对于数据框，目标是获得一列 -

与另一列 -

的平均值，因为组中

的第一个值不为空，如果为空，则返回 null。

示例数据框

df = pl.DataFrame({"a": [None, 1, 2, 3, 4], "b": [1, 1, 2, 2, 2]})

我尝试过类似的事情

df.group_by("b").agg(
    pl.when(pl.col("a").first().is_null()).then(None).otherwise(pl.mean("a"))
)

结果符合预期，但收到一条警告，指出

when

可能无法保证在 group_by 上下文中完成其工作。

The predicate 'col("a").first().is_null()' in 'when->then->otherwise' is not a valid aggregation and might produce a different number of rows than the groupby operation would. This behavior is experimental and may be subject to change
shape: (2, 2)
┌─────┬─────────┐
│ b   ┆ literal │
│ --- ┆ ---     │
│ i64 ┆ f64     │
╞═════╪═════════╡
│ 1   ┆ null    │
│ 2   ┆ 3.0     │
└─────┴─────────┘

我可以知道为什么以及在 group_by 中执行 if-else 的更好替代方法是什么吗？

Answer 1

您可以使用：

```
pl.col("a").is_null().first()
```

而不是：

```
pl.col("a").first().is_null()
```

如果我们看看这两种方法：

df.group_by("b", maintain_order=True).agg(
   pl.col("a"),
   pl.col("a").is_not_null().alias("yes"),
   pl.col("a").first().is_not_null().alias("no"),
)

shape: (2, 4)
┌─────┬───────────┬────────────────────┬───────┐
│ b   ┆ a         ┆ yes                ┆ no    │
│ --- ┆ ---       ┆ ---                ┆ ---   │
│ i64 ┆ list[i64] ┆ list[bool]         ┆ bool  │
╞═════╪═══════════╪════════════════════╪═══════╡
│ 1   ┆ [null, 1] ┆ [false, true]      ┆ false │
│ 2   ┆ [2, 3, 4] ┆ [true, true, true] ┆ true  │
└─────┴───────────┴────────────────────┴───────┘

我的理解是，在

no

的情况下，只有

null

和

被传递给

.is_not_null()

- 其余的输入已被“默默地丢弃”。

polar 知道

的长度为

和

，并期望“布尔掩码”具有相同的长度。

我们可以取

.first()

的

yes

值，其最终结果相同：

df.group_by("b", maintain_order=True).agg(
   pl.col("a"),
   pl.col("a").is_not_null().first().alias("yes"),
   pl.col("a").first().is_not_null().alias("no"),
)

shape: (2, 4)
┌─────┬───────────┬───────┬───────┐
│ b   ┆ a         ┆ yes   ┆ no    │
│ --- ┆ ---       ┆ ---   ┆ ---   │
│ i64 ┆ list[i64] ┆ bool  ┆ bool  │
╞═════╪═══════════╪═══════╪═══════╡
│ 1   ┆ [null, 1] ┆ false ┆ false │
│ 2   ┆ [2, 3, 4] ┆ true  ┆ true  │
└─────┴───────────┴───────┴───────┘

但是现在所有输入都已传递到

.is_not_null()

并且长度检查通过了。

如何在 Polars group_by 上下文中执行 if 和 else

问题描述投票：0回答：1

1个回答

最新问题

如何在 Polars group_by 上下文中执行 if 和 else

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1