从滚动窗口中逐帧获取行号

问题描述 投票:0回答:1

我想从 SQL/duckdb 的滚动窗口中按帧(而不是分区)获取行号。

有了这个数据

customer_id,date
ca,2024-04-03
ca,2024-04-04
ca,2024-04-04
ca,2024-04-11
cb,2024-04-02
cb,2024-04-02
cb,2024-04-03
cb,2024-05-13

还有这个查询

SELECT
    customer_id,
    date,
    row_number() OVER win AS row_by_partition
FROM 'example.csv'
WINDOW win AS (
    PARTITION BY customer_id
    ORDER BY date ASC
    RANGE BETWEEN CURRENT ROW
      AND INTERVAL 1 WEEK FOLLOWING)

我通过分区获取行号

┌─────────────┬────────────┬──────────────────┐
│ customer_id │    date    │ row_by_partition │
│   varchar   │    date    │      int64       │
├─────────────┼────────────┼──────────────────┤
│ ca          │ 2024-04-03 │                1 │
│ ca          │ 2024-04-04 │                2 │
│ ca          │ 2024-04-04 │                3 │
│ ca          │ 2024-04-11 │                4 │
│ cb          │ 2024-04-02 │                1 │
│ cb          │ 2024-04-02 │                2 │
│ cb          │ 2024-04-03 │                3 │
│ cb          │ 2024-05-13 │                4 │
└─────────────┴────────────┴──────────────────┘

但是,我想按帧获取行号

┌─────────────┬────────────┬──────────────┐
│ customer_id │    date    │ row_by_frame │
│   varchar   │    date    │      int64   │
├─────────────┼────────────┼──────────────┤
│ ca          │ 2024-04-03 │            1 │
│ ca          │ 2024-04-04 │            1 │
│ ca          │ 2024-04-04 │            2 │
│ ca          │ 2024-04-11 │            1 │
│ cb          │ 2024-04-02 │            1 │
│ cb          │ 2024-04-02 │            2 │
│ cb          │ 2024-04-03 │            1 │
│ cb          │ 2024-05-13 │            1 │
└─────────────┴────────────┴──────────────┘
sql duckdb
1个回答
0
投票

您可能可以分两步计算 - 首先,获取帧内的所有数据,然后计算该帧内的行索引。我正在根据您的数据添加一个示例,您可能需要根据数据的唯一性进行调整。

import duckdb

duckdb.sql("""
with cte as (
    select
        customer_id,
        date,
        array_agg(date) over win as dates
    from df
    window win as (
        partition by customer_id
        order by date asc
        range between current row and interval 1 week following
    )
)
select
    customer_id,
    date,
    row_number() over(partition by customer_id, dates) as row_by_frame
from cte
""")
┌─────────────┬────────────┬──────────────┐
│ customer_id │    date    │ row_by_frame │
│   varchar   │    date    │    int64     │
├─────────────┼────────────┼──────────────┤
│ cb          │ 2024-04-02 │            1 │
│ cb          │ 2024-04-02 │            2 │
│ cb          │ 2024-04-03 │            1 │
│ ca          │ 2024-04-04 │            1 │
│ ca          │ 2024-04-04 │            2 │
│ cb          │ 2024-05-13 │            1 │
│ ca          │ 2024-04-11 │            1 │
│ ca          │ 2024-04-03 │            1 │
└─────────────┴────────────┴──────────────┘
© www.soinside.com 2019 - 2024. All rights reserved.