我已经设法分两步解决这个问题。
import polars as pl
text = "a brown fox jumps over a lazy dog's head"
step = 3
df = pl.DataFrame({"a":text.split(" ")})
first = df.filter(pl.arange(0, pl.count())%step==0)
second = df.filter(pl.arange(0, pl.count())%step==1)
third= df.filter(pl.arange(0, pl.count())%step==2)
dff = (
pl.DataFrame({
'first':first['a'],
'second':second['a'],
'third':third['a']})
)
print(dff)
shape: (3, 3)
┌───────┬────────┬───────┐
│ first ┆ second ┆ third │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞═══════╪════════╪═══════╡
│ a ┆ brown ┆ fox │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ jumps ┆ over ┆ a │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ lazy ┆ dog's ┆ head │
└───────┴────────┴───────┘
#
我的印象是,这应该可以通过单个表达式链轻松解决,但我还没有做到。有什么建议吗?
text = "a brown fox jumps over a lazy dog's head"
step = 3
df = pl.DataFrame({"a":text.split(" ")})
(df.with_columns(
(pl.int_range(pl.len()) // step).alias("step")
).group_by("step", maintain_order=True)
.agg(
pl.col("a").get(i).alias(name) for i, name in enumerate(["first", "second", "third"])
))
shape: (3, 4)
┌──────┬───────┬────────┬───────┐
│ step ┆ first ┆ second ┆ third │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str │
╞══════╪═══════╪════════╪═══════╡
│ 0 ┆ a ┆ brown ┆ fox │
│ 1 ┆ jumps ┆ over ┆ a │
│ 2 ┆ lazy ┆ dog's ┆ head │
└──────┴───────┴────────┴───────┘