为极坐标中的列子集实现 cum_count()

Question

考虑这个虚拟数据集，

import pandas as pd
import numpy as np

np.random.seed(44)
num_rows = 8

data = {
    'item_id': np.random.choice(['A', 'B'], num_rows),
    'store_id': np.random.choice([1, 2], num_rows),
    'sold_quantity': np.random.randint(0, 5, num_rows),  
    'total_sku_count': np.random.choice([0, 1], num_rows)  
}

df = pd.DataFrame(data)

  item_id  store_id  sold_quantity  total_sku_count
0       A         2              3                1
1       B         2              3                1
2       B         1              3                1
3       B         1              4                0
4       B         2              1                0
5       B         2              1                0
6       A         1              0                1
7       A         2              4                1

我可以在 pandas 中像这样计算子集累积和，


subset = ["item_id",'store_id']
df['cum_count'] = df.groupby(subset).cumcount()+1

  item_id  store_id  sold_quantity  total_sku_count  count  cum_count
0       A         2              3                1      1          1
1       B         2              3                1      1          1
2       B         1              3                1      1          1
3       B         1              4                0      2          2
4       B         2              1                0      2          2
5       B         2              1                0      3          3
6       A         1              0                1      1          1
7       A         2              4                1      2          2

我正在尝试在 Polars 中实现相同的功能。由于我对 Polar 的经验相对较少，所以我很难做到这一点。我尝试过类似的方法，但没有达到预期的效果，

subset = ["item_id",'store_id']
df = df.with_columns((pl.struct(subset).over(subset).cum_count()).alias("cum_counts"))

如果有办法实现这一点，请帮助我。非常感谢您的支持。

Answer 1

.over()

的位置很重要。

您希望每个组都有

.cum_count()

- 因此它必须位于

.over()

之前

df.with_columns(
   pl.struct(subset).cum_count().over(subset).alias("cum_counts")
)

shape: (8, 5)
┌─────────┬──────────┬───────────────┬─────────────────┬────────────┐
│ item_id ┆ store_id ┆ sold_quantity ┆ total_sku_count ┆ cum_counts │
│ ---     ┆ ---      ┆ ---           ┆ ---             ┆ ---        │
│ str     ┆ i64      ┆ i64           ┆ i64             ┆ u32        │
╞═════════╪══════════╪═══════════════╪═════════════════╪════════════╡
│ A       ┆ 2        ┆ 3             ┆ 1               ┆ 0          │
│ B       ┆ 2        ┆ 3             ┆ 1               ┆ 0          │
│ B       ┆ 1        ┆ 3             ┆ 1               ┆ 0          │
│ B       ┆ 1        ┆ 4             ┆ 0               ┆ 1          │
│ B       ┆ 2        ┆ 1             ┆ 0               ┆ 1          │
│ B       ┆ 2        ┆ 1             ┆ 0               ┆ 2          │
│ A       ┆ 1        ┆ 0             ┆ 1               ┆ 0          │
│ A       ┆ 2        ┆ 4             ┆ 1               ┆ 1          │
└─────────┴──────────┴───────────────┴─────────────────┴────────────┘

为极坐标中的列子集实现 cum_count()

问题描述投票：0回答：1

1个回答

最新问题

为极坐标中的列子集实现 cum_count()

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1