我有一个 customer_audit 表,记录发生在客户表上的 INSERT 和 UPDATE。
客户审核
id | 操作 | 时间戳 | 客户 ID | 地址1 | 地址2 |
---|---|---|---|---|---|
1 | 我 | 2024-10-05 | 100 | 链接st | 1号 |
2 | 你 | 2024-10-06 | 100 | 链接st | 2号 |
3 | 你 | 2024-10-07 | 100 | 连接路 | 2号 |
4 | 你 | 2024-10-08 | 100 | 连接路 | 数量200 |
5 | 我 | 2024-10-06 | 200 | 公园街 | 20号 |
6 | 你 | 2024-10-08 | 200 | 公园街 | 数量200 |
预期的sql输出是,对于我们感兴趣的列,在本例中是address1和address2,分别显示address1和address2的所有历史值,并根据时间戳列计算出from_date和to_date。
例如对于 customer_id 100,
客户 ID | 列名称 | 列值 | 起始日期 | 截止日期 |
---|---|---|---|---|
100 | 地址1 | 链接st | 2024-10-05 | 2024-10-07 |
100 | 地址1 | 连接路 | 2024-10-07 | null(表示列值是当前值) |
100 | 地址2 | 1号 | 2024-10-05 | 2024-10-06 |
100 | 地址2 | 2号 | 2024-10-06 | 2024-10-08 |
100 | 地址2 | 数量200 | 2024-10-08 | 空 |
我通过稍微调整 mcwolf 的查询来解决这个问题,通过这样做
select customer_id, name, 'address1' as attribute_name, address1 as "attribute_value", min(from_date) as "valid_from", max(to_date) as "valid_to"
from (
select customer_id,
name,
address1,
"mod_timestamp" from_date,
COALESCE(lead("mod_timestamp") over change_window, to_date('99991231', 'yyyymmdd')) to_date,
ROW_NUMBER () OVER ( PARTITION BY customer_id, address1 ORDER BY mod_timestamp) as rownum
from customer_audit
WINDOW change_window as (PARTITION BY customer_id order by mod_timestamp)
order by customer_id, mod_timestamp ) a
group by customer_id, name, attribute_name, "attribute_value"
order by customer_id, "valid_from";
...这是另一种方法...(代码中的注释)
WITH
dates AS -- track changes of addresses ( ..._rn = 1 )
( Select ca.id, customer_id, "timestamp" as ts,
address1,
Row_Number() Over(Partition By customer_id, address1 Order By id) addr1_rn,
address2,
Row_Number() Over(Partition By customer_id, address2 Order By id) addr2_rn
From customer_audit ca
Order By customer_id, id
),
grid AS -- get changes of addr1 Union All changes of addr2
( SELECT f.id, f.customer_id,
'address1' as column_name, f.address1 as column_value, f.ts
FROM ( Select * From dates Where addr1_rn = 1 ) f
UNION ALL
SELECT f.id, f.customer_id,
'address2', f.address2, f.ts
FROM ( Select * From dates Where addr2_rn = 1 ) f
)
-- M a i n S Q L :
Select customer_id, column_name, column_value,
ts as from_date, Lead(ts) Over(Partition By customer_id, column_name
Order By id) as to_date
From grid
Order By customer_id, column_name, id
/* R e s u l t :
customer_id column_name column_value from_date to_date
----------- ----------- -------------- ---------------------- ----------------------
100 address1 link st 2024-10-05 00:00:00 2024-10-07 00:00:00
100 address1 link road 2024-10-07 00:00:00 null
100 address2 number 1 2024-10-05 00:00:00 2024-10-06 00:00:00
100 address2 number 2 2024-10-06 00:00:00 2024-10-08 00:00:00
100 address2 number 200 2024-10-08 00:00:00 null
--
200 address1 park st 2024-10-06 00:00:00 null
200 address2 number 20 2024-10-06 00:00:00 2024-10-08 00:00:00
200 address2 number 200 2024-10-08 00:00:00 null */