是否可以将groupby中的列表聚合为一个列表，而不是polar中的列表列表？（即：求和/扩展列表而不是列出它们）

Question

我有这种类型的数据：

import polars as pl

df = pl.DataFrame(
    {
        "Case": ["case1", "case1"],
        "List": [["x1", "x2"], ["x3", "x4"]],
    }
)

我想对它们进行分组，以便列表相互添加（就像 python 列表的

.extend

函数）。

我尝试在 groupby 之后展平列表，但速度非常慢，如果行数太多，我的计算机（macOs M1）就会崩溃，因此无法扩展。（我处理数百万行）。请参阅下面的代码：

df.group_by("Case").agg(
    pl.col("List")
).with_columns(
    pl.col("List").list.eval(pl.element().explode()),
)

有没有更简单和/或更有效的方法来做到这一点？

我希望是这样的：

df.group_by("Case").agg(
    pl.col("List").sum_list()
)

会在合理的时间内以可扩展的方式给我:

shape: (1, 2)
┌───────┬────────────────────────┐
│ Case  ┆ List                   │
│ ---   ┆ ---                    │
│ str   ┆ list[str]              │
╞═══════╪════════════════════════╡
│ case1 ┆ ["x1", "x2", ... "x4"] │
└───────┴────────────────────────┘

仅供参考：pandas 中的这个等价物非常简单：

df.groupby("Case").agg(sum)

因为 pandas 处理列表的总和。但它的性能不适合我的情况。

更新：在此处创建问题后：https://github.com/pola-rs/polars/issues/6188解决方案似乎是使用 str.concat 函数而不是使用列表我明白了。

Answer 1

这将为您提供预期的输出：

In [4]: df.group_by('Case').agg(pl.col('List').explode())
Out[4]:
shape: (1, 2)
┌───────┬────────────────────────┐
│ Case  ┆ List                   │
│ ---   ┆ ---                    │
│ str   ┆ list[str]              │
╞═══════╪════════════════════════╡
│ case1 ┆ ["x1", "x2", ... "x4"] │
└───────┴────────────────────────┘

Answer 2

在讨论完 Polars 存储库上的问题后：https://github.com/pola-rs/polars/issues/6188这里是一个使用字符串连接来实现这一点的代码示例：

df = pl.DataFrame(
    {
        "Case": ["case1", "case1"],
        "List": ["x1, x2", "x3, x4"], # shape your data as seperated string
    }).group_by("Case").agg(
            pl.col("List").str.concat(", ") # use the concat fonction with your seperator
           ).with_columns(pl.col("List").str.split(", ")) # split to get a list

因此，只需将字符串一路连接起来，然后根据使用的分隔符拆分字符串即可。这不会造成内存或时间超载。这不是一个完美的解决方案，因为它有点具体，但至少对我有用。

Answer 3

(
    df
     .with_columns(
         pl.col("List").list.join(",")
     )
     .group_by("Case")
     .agg(
         pl.col("List")
     )
     .with_columns(
         pl.col("List").list.join(",")
     )
     .with_columns(
         pl.col("List").str.split(",").list.unique().alias("List")
     )
)

是否可以将groupby中的列表聚合为一个列表，而不是polar中的列表列表？（即：求和/扩展列表而不是列出它们）

问题描述投票：0回答：3

3个回答

最新问题

是否可以将groupby中的列表聚合为一个列表，而不是polar中的列表列表？ （即：求和/扩展列表而不是列出它们）

问题描述 投票：0回答：3

3个回答

最新问题

是否可以将groupby中的列表聚合为一个列表，而不是polar中的列表列表？（即：求和/扩展列表而不是列出它们）

问题描述投票：0回答：3