极坐标,基于“group_by”中其他列值的求和列

问题描述 投票:0回答:1

我想根据另一列的值计算

group_by
中一列的总和。与
pl.Expr.value_counts
的作用差不多(参见示例),但我想将函数(例如
sum
)应用于特定列,在本例中为 Price 列。

我知道我可以在

Weather
+ Windy 上执行 group_by 然后进行聚合,但是,我不能这样做,因为我有很多其他聚合,我只需要在 Weather group_by 上进行计算。

import polars as pl
df = pl.DataFrame(
    data = {
            "Weather":["Rain","Sun","Rain","Sun","Rain","Sun","Rain","Sun"],
            "Price":[1,2,3,4,5,6,7,8],
            "Windy":["Y","Y","Y","Y","N","N","N","N"]
    }
)

我可以通过

value_counts

获得每个大风天的计数数量
df_agg = (df
        .group_by("Weather")
        .agg(
            pl.col("Windy")
                .value_counts()
                .alias("Price")
        )
)

shape: (2, 2)
┌─────────┬────────────────────┐
│ Weather ┆ Price              │
│ ---     ┆ ---                │
│ str     ┆ list[struct[2]]    │
╞═════════╪════════════════════╡
│ Sun     ┆ [{"Y",2}, {"N",2}] │
│ Rain    ┆ [{"Y",2}, {"N",2}] │
└─────────┴────────────────────┘

我想做这样的事情:

df_agg =(df
        .group_by("Weather")
        .agg(
            pl.col("Windy")
                .custom_fun_on_other_col("Price",sum)
                .alias("Price")
        )
)

而且,这就是我想要的结果,


shape: (2, 2)
┌─────────┬────────────────────┐
│ Weather ┆ Price              │
│ ---     ┆ ---                │
│ str     ┆ list[struct[2]]    │
╞═════════╪════════════════════╡
│ Sun     ┆ [{"Y",6},{"N",14}] │
│ Rain    ┆ [{"Y",4},{"N",12}] │
└─────────┴────────────────────┘
python python-polars
1个回答
1
投票

例如,您可以创建临时数据框,然后将其与主数据框连接。

tmp = df.group_by("Weather", "Windy").agg(pl.col("Price").sum())\
        .select(pl.col("Weather"), pl.struct("Windy", "Price"))\
        .group_by("Weather").agg("Windy")
df.group_by("Weather").agg(
    # your another aggregations ...
).join(tmp, on="Weather")
┌─────────┬─────────────────────┐
│ Weather ┆ Windy               │
│ ---     ┆ ---                 │
│ str     ┆ list[struct[2]]     │
╞═════════╪═════════════════════╡
│ Rain    ┆ [{"Y",4}, {"N",12}] │
│ Sun     ┆ [{"N",14}, {"Y",6}] │
└─────────┴─────────────────────┘
© www.soinside.com 2019 - 2024. All rights reserved.