说我有
data = {'id': [1, 1, 1, 2, 2, 2],
'd': [1, 2, 3, 1, 2, 3],
'sales': [1, 4, 2, 3, 1, 2]}
我现在的最终目标是能够翻译
import duckdb
import polars as pl
df = pl.DataFrame(data)
duckdb.sql("""
select *, case when count(sales) over w then sum(sales) over w else null end as rolling_sales
from df
window w as (partition by id order by d rows between 1 preceding and current row)
""")
我已经做到了:
rel = duckdb.table("df")
rel.sum(
"sales",
projected_columns="*",
window_spec="over (partition by id order by d rows between 1 preceding and current row) as rolling_sales",
)
我认为这比巨大的 SQL 字符串更具可读性
但是我怎样才能把
case when then
部分放在那里呢?我看过https://duckdb.org/docs/api/python/relational_api.html并且没有提到“case”
我不确定这是实现它的最佳方法,我认为只编写简单的 SQL(或使用 Polars API)会更具可读性,但你可以这样做:
rel = rel.count(
"*",
projected_columns="*",
window_spec="over (partition by id order by d rows between 1 preceding and current row) as rolling_sale",
)
rel = rel.select(
"* exclude(rolling_sale), case when rolling_sale then sales else null end as rolling_sale"
)
rel = rel.sum(
"rolling_sale",
projected_columns="* exclude(rolling_sale)",
window_spec="over (partition by id order by d rows between 1 preceding and current row) as rolling_sale",
)
┌───────┬───────┬───────┬──────────────┐
│ id │ d │ sales │ rolling_sale │
│ int64 │ int64 │ int64 │ int128 │
├───────┼───────┼───────┼──────────────┤
│ 1 │ 1 │ 1 │ 1 │
│ 1 │ 2 │ 4 │ 5 │
│ 1 │ 3 │ 2 │ 6 │
│ 2 │ 1 │ 3 │ 3 │
│ 2 │ 2 │ 1 │ 4 │
│ 2 │ 3 │ 2 │ 3 │
└───────┴───────┴───────┴──────────────┘