Polars str.starts_with() 与另一列的值

Question

我有一个极坐标数据框例如：

>>> df = pl.DataFrame({'A': ['a', 'b', 'c', 'd'], 'B': ['app', 'nop', 'cap', 'tab']})
>>> df
shape: (4, 2)
┌─────┬─────┐
│ A   ┆ B   │
│ --- ┆ --- │
│ str ┆ str │
╞═════╪═════╡
│ a   ┆ app │
│ b   ┆ nop │
│ c   ┆ cap │
│ d   ┆ tab │
└─────┴─────┘

我正在尝试获取第三列

，如果

True

列中的字符串以同一行的

列中的字符串开头，则为

，否则为

False

。所以在上面的例子中，我期望：

┌─────┬─────┬───────┐
│ A   ┆ B   ┆ C     │
│ --- ┆ --- ┆ ---   │
│ str ┆ str ┆ bool  │
╞═════╪═════╪═══════╡
│ a   ┆ app ┆ true  │
│ b   ┆ nop ┆ false │
│ c   ┆ cap ┆ true  │
│ d   ┆ tab ┆ false │
└─────┴─────┴───────┘

我知道

df['B'].str.starts_with()

函数，但传入一列会产生：

>>> df['B'].str.starts_with(pl.col('A'))
...  # Some stuff here.
TypeError: argument 'sub': 'Expr' object cannot be converted to 'PyString'

有什么方法可以做到这一点？在 pandas 中，你会这样做：

df.apply(lambda d: d['B'].startswith(d['A']), axis=1)

Answer 1

作为 Polars 0.15.17

 版本的一部分，在

pull/6355 中添加了对 .str.starts_with() 的表达式支持。

df.with_columns(pl.col("B").str.starts_with(pl.col("A")).alias("C"))

shape: (4, 3)
┌─────┬─────┬───────┐
│ A   | B   | C     │
│ --- | --- | ---   │
│ str | str | bool  │
╞═════╪═════╪═══════╡
│ a   | app | true  │
│ b   | nop | false │
│ c   | cap | true  │
│ d   | tab | false │
└─────┴─────┴───────┘

Answer 2

好吧，尝试了一下之后，这可以工作，但我很确定在后面使用了 Python 字符串（基于函数名称

startswith

），因此没有优化：

>>> pl.concat((df, df.apply(lambda d: d[1].startswith(d[0]))))
shape: (4, 3)
┌─────┬─────┬───────┐
│ A   ┆ B   ┆ apply │
│ --- ┆ --- ┆ ---   │
│ str ┆ str ┆ bool  │
╞═════╪═════╪═══════╡
│ a   ┆ app ┆ true  │
│ b   ┆ nop ┆ false │
│ c   ┆ cap ┆ true  │
│ d   ┆ tab ┆ false │
└─────┴─────┴───────┘

我会在 Polars 上提出功能请求，看看是否可以改进。

Answer 3

如果极坐标 >=0.13.16，则使用

struct

是另一种选择。然而，这种方法也像这个答案

一样使用

str.startswith，而不是

polars.Expr.str.starts_with

。

代码：

import polars as pl

df = pl.DataFrame({'A': ['a', 'b', 'c', 'd'], 'B': ['app', 'nop', 'cap', 'tab']})

df.with_column(
    pl.struct(['A', 'B']).apply(lambda r: r['B'].startswith(r['A'])).alias('C')
)

输出：

┌─────┬─────┬───────┐
│ A   ┆ B   ┆ C     │
│ --- ┆ --- ┆ ---   │
│ str ┆ str ┆ bool  │
╞═════╪═════╪═══════╡
│ a   ┆ app ┆ true  │
│ b   ┆ nop ┆ false │
│ c   ┆ cap ┆ true  │
│ d   ┆ tab ┆ false │
└─────┴─────┴───────┘

参考：

如何编写极坐标自定义应用函数来逐行处理？

Polars str.starts_with() 与另一列的值

问题描述投票：0回答：3

3个回答

代码：

输出：

参考：

最新问题

Polars str.starts_with() 与另一列的值

问题描述 投票：0回答：3

3个回答

代码：

输出：

参考：

最新问题

问题描述投票：0回答：3