使用 Polars `with_column` 时减少代码/表达式重复？

Question

考虑一些像这样的 Polars 代码：

df.with_columns(
    pl.date_ranges(
        pl.col("current_start"), pl.col("current_end"), "1mo", closed="left"
    ).alias("current_tpoints")
).drop("current_start", "current_end").with_columns(
    pl.date_ranges(
        pl.col("history_start"), pl.col("history_end"), "1mo", closed="left"
    ).alias("history_tpoints")
).drop(
    "history_start", "history_end"
)

这里要注意的关键问题是

history_*

和

current_*

的重复性。我可以通过这样做减少重复：

for x in ["history", "current"]:
    fstring = f"{x}" + "_{other}"
    start = fstring.format(other="start")
    end = fstring.format(other="end")
    df = df.with_columns(
        pl.date_ranges(
            pl.col(start),
            pl.col(end),
            "1mo",
            closed="left",
        ).alias(fstring.format(other="tpoints"))
    ).drop(start, end)

但是我应该考虑其他方法来减少重复吗？

Answer 1

由于您可能不需要任何原始列，因此您可以使用

select()

代替

with_columns()

，这样您就不需要

drop()

列。

您可以在

select()

/

with_columns()

中循环列名称：

df.select(
    pl.date_ranges(
        pl.col(f"{c}_start"), pl.col(f"{c}_end"), "1mo", closed="left"
    ).alias(f"{c}_tpoints") for c in ["current", "history"]
)

解释为什么它有效：

根据文档，

select()

和

with_columns()

方法都可以

*exprs: IntoExpr | Iterable[IntoExpr]

，这意味着可变数量的参数。您会看到它可以是多个表达式或多个表达式列表。

这正是我们可以使用列表理解做的事情，我们只需创建表达式列表。

[
    pl.date_ranges(
        pl.col(f"{c}_start"), pl.col(f"{c}_end"), "1mo", closed="left"
    ).alias(f"{c}_tpoints") for c in ["current", "history"]
]

[<Expr ['col("current_start").date_rang…'] at 0x206D93030E0>,
 <Expr ['col("history_start").date_rang…'] at 0x206D8F85520>]

然后我们可以将其传递给极坐标方法。请注意，我在最终答案中没有方括号。这是因为我们实际上并不需要表达式列表，我们只需要一个可迭代的（在本例中 - generator）。

使用 Polars `with_column` 时减少代码/表达式重复？

问题描述投票：0回答：1

1个回答

最新问题

使用 Polars `with_column` 时减少代码/表达式重复？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1