我有以下代码
import polars as pl
import datetime as dt
from dateutil.relativedelta import relativedelta
def get_3_month_splits(product: str) -> list[str]:
front, start_dt, total_m = product.rsplit('.', 2)
start_dt = dt.datetime.strptime(start_dt, '%Y%m')
total_m = int(total_m)
return [f'{front}.{(start_dt+relativedelta(months=m)).strftime("%Y%m")}.3' for m in range(0, total_m, 3)]
df = pl.DataFrame({
'product': ['CHECK.GB.202403.12', 'CHECK.DE.202506.6', 'CASH.US.202509.12'],
'qty': [3, 6, -3],
'price': [100, 102, 95],
})
print(df)
df2 = pl.DataFrame([{'product_split': split} | d
for d in df.iter_rows(named=True) for split in get_3_month_splits(d['product'])
])
print(df2)
本质上,我想通过将
product
拆分为更详细的 product_split
列来扩展每一行,同时保持所有其他字段(如 qty
、price
)相同。
上面实现了这一点,但是有没有更原生的方法通过
with_columns
实现这一点?
看起来您想要创建日期范围,您可以使用以下方法:
splits = pl.col("product").str.splitn(".", 4)
start_dt = splits.struct[-2].str.to_date("%Y%m")
total_m = splits.struct[-1]
(
df.with_columns(
pl.date_ranges(
start_dt,
start_dt.dt.offset_by(pl.format("{}mo", total_m.first())),
interval = "3mo",
closed="left"
)
.over(total_m)
.alias("date_ranges")
)
.explode("date_ranges")
)
shape: (10, 4)
┌────────────────────┬─────┬───────┬─────────────┐
│ product ┆ qty ┆ price ┆ date_ranges │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ date │
╞════════════════════╪═════╪═══════╪═════════════╡
│ CHECK.GB.202403.12 ┆ 3 ┆ 100 ┆ 2024-03-01 │
│ CHECK.GB.202403.12 ┆ 3 ┆ 100 ┆ 2024-06-01 │
│ CHECK.GB.202403.12 ┆ 3 ┆ 100 ┆ 2024-09-01 │
│ CHECK.GB.202403.12 ┆ 3 ┆ 100 ┆ 2024-12-01 │
│ CHECK.DE.202506.6 ┆ 6 ┆ 102 ┆ 2025-06-01 │
│ CHECK.DE.202506.6 ┆ 6 ┆ 102 ┆ 2025-09-01 │
│ CASH.US.202509.12 ┆ -3 ┆ 95 ┆ 2025-09-01 │
│ CASH.US.202509.12 ┆ -3 ┆ 95 ┆ 2025-12-01 │
│ CASH.US.202509.12 ┆ -3 ┆ 95 ┆ 2026-03-01 │
│ CASH.US.202509.12 ┆ -3 ┆ 95 ┆ 2026-06-01 │
└────────────────────┴─────┴───────┴─────────────┘