如何从分区表Postgresql/Clickhouse实现SCD2类型表？

Question

我通常按 ds 分区 - 具有许多功能的日期列表。并非如此，所有列每天都会发生变化，因此大多数行只是前面行的重复。我想从现有表（已分区）实现 SCD2

并得到 dt_start - 记录实际的期间开始，dt_end - 期间结束

如果记录当前是实际的则 dt_end = NULL

我想到了像窗口函数这样的东西

ds 作为 dt_start， __(ds) over(按 user_id 分区，country_id 按 ds 排序) as dt_end， ... 按表中的所有列进行分组

CREATE TABLE public.app(
    ds date NULL,
    user_id int4 NULL,
    country_id int2 NULL,
    n_sessions_1d int2 NULL,
    n_sessions_3d int2 NULL,
    n_sessions_1w int2 NULL,
    n_sessions_2w int2 NULL,
    n_sessions_1m int2 NULL,
    total_time_spent_1d int4 NULL,
    total_time_spent_3d int4 NULL,
    total_time_spent_1w int4 NULL,
    total_time_spent_2w int4 NULL,
    total_time_spent_1m int4 NULL,
    is_subscription_1d int2 NULL,
    is_subscription_3d int2 NULL
)
PARTITION BY RANGE (ds);
CREATE INDEX idx ON ONLY public.app USING btree (user_id, country_id);

Answer 1

您可以使用相当简单的聚合：
_{db<>fiddle 的演示}

select min(ds) as dt_start 
     , max(ds) as dt_end
     , user_id,country_id,n_sessions_1d,n_sessions_3d,n_sessions_1w,n_sessions_2w,n_sessions_1m,total_time_spent_1d,total_time_spent_3d,total_time_spent_1w,total_time_spent_2w,total_time_spent_1m,is_subscription_1d,is_subscription_3d
from public.app
group by user_id,country_id,n_sessions_1d,n_sessions_3d,n_sessions_1w,n_sessions_2w,n_sessions_1m,total_time_spent_1d,total_time_spent_3d,total_time_spent_1w,total_time_spent_2w,total_time_spent_1m,is_subscription_1d,is_subscription_3d;

从技术上讲，你的窗口函数想法是可行的，但实现这一点只是要复杂得多：

您可以将
```
first_value()
```
用作
```
dt_start
```
和
```
last_value()
```
用作
```
dt_end
```
扩展分区以包含除
```
ds
```
之外的所有值。这样，不同日期的所有相同行都共享该分区。

将框架加宽至

between unbounded preceding and

unbounded following
以覆盖默认的

between unbounded preceding and

current row
，该默认值带有

order by

并且没有框架定义。

这会导致您想要获得但对所有输入行重复的结果。获取其中的
```
distinct
```
，每组只保留一个。

select distinct
       first_value(ds)over w1 as dt_start 
     , last_value(ds)over w1 as dt_end
     , user_id,country_id,n_sessions_1d,n_sessions_3d,n_sessions_1w,n_sessions_2w,n_sessions_1m,total_time_spent_1d,total_time_spent_3d,total_time_spent_1w,total_time_spent_2w,total_time_spent_1m,is_subscription_1d,is_subscription_3d
from public.app
window w1 as (partition by user_id, country_id, n_sessions_1d,n_sessions_3d,n_sessions_1w,n_sessions_2w,n_sessions_1m,total_time_spent_1d,total_time_spent_3d,total_time_spent_1w,total_time_spent_2w,total_time_spent_1m,is_subscription_1d,is_subscription_3d
              order by ds
              rows between unbounded preceding 
                       and unbounded following);

如何从分区表Postgresql/Clickhouse实现SCD2类型表？

问题描述投票：0回答：1

1个回答

最新问题

如何从分区表Postgresql/Clickhouse实现SCD2类型表？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1