假设我有两个表
a
和 b
,每个表包含一系列日期:
表
a
:
日_日期 |
---|
2020-1-1 |
2020-1-2 |
2020-1-3 |
2020-1-4 |
2020-1-5 |
2020-1-6 |
2020-1-7 |
2020-1-8 |
2020-1-9 |
2020-1-10 |
和表
b
:
id | 某个日期 |
---|---|
0 | 2020-1-3 |
0 | 2020-1-6 |
0 | 2020-1-8 |
1 | 2020-1-2 |
1 | 2020-1-5 |
我想创建一个新表
c
,其中包含day_date
、id
和some_date
,但现在some_date
仅包含使用高于某个给定阈值的最小值的值,即:
日_日期 | id | 下一个日期 |
---|---|---|
2020-1-1 | 0 | 2020-1-3 |
2020-1-2 | 0 | 2020-1-3 |
2020-1-3 | 0 | 2020-1-6 |
2020-1-4 | 0 | 2020-1-6 |
2020-1-5 | 0 | 2020-1-6 |
2020-1-6 | 0 | 2020-1-8 |
2020-1-7 | 0 | 2020-1-8 |
2020-1-8 | 0 | 空 |
2020-1-9 | 0 | 空 |
2020-1-10 | 0 | 空 |
2020-1-1 | 1 | 2020-1-2 |
2020-1-2 | 1 | 2020-1-5 |
2020-1-3 | 1 | 2020-1-5 |
2020-1-4 | 1 | 2020-1-5 |
2020-1-5 | 1 | 空 |
2020-1-6 | 1 | 空 |
2020-1-7 | 1 | 空 |
2020-1-8 | 1 | 空 |
2020-1-9 | 1 | 空 |
2020-1-10 | 1 | 空 |
我的想法是过滤两者的交叉连接,例如:
CREATE TEMP TABLE some_next AS (
SELECT
day_date,
id,
CASE WHEN some_date > day_date THEN some_date ELSE NULL END AS next_churn_date,
ROW_NUMBER() OVER (PARTITION BY day_date, id ORDER BY some_date ASC) AS rn_next
FROM a CROSS JOIN b
WHERE some_date > day_date OR day_date >= (SELECT MAX(some_date) FROM b bb WHERE bb.id = b.id
);
CREATE TABLE c AS (
SELECT * FROM some_next WHERE rn_next = 1 ORDER BY id, day_date
);
我一直在寻找一个更简单的解决方案。有什么想法吗?
我们可以从交叉连接开始来获取 ID 和日期的所有组合。
select
day,
id
from days
cross join ( select distinct id from some_days )
order by id, day;
然后用它来对 some_days 与这些天的接近程度进行排名(降低顺序,这是不必要的)。
with days_ids as (
select
day,
id
from days
cross join ( select distinct id from some_days )
)
select
di.day as day,
di.id,
sd.day as some_day,
row_number() over (
partition by di.day, di.id
order by sd.day asc
) as row_num
from days_ids di
left join some_days sd on di.day < sd.day and di.id = sd.id
order by di.id, di.day, row_num
并仅选择 row_number 为 1 的行。
with days_ids as (
select
day,
id
from days
cross join ( select distinct id from some_days )
),
matched_days as (
select
di.day as day,
di.id,
sd.day as some_day,
row_number() over (
partition by di.day, di.id
order by sd.day asc
) as row_num
from days_ids di
left join some_days sd on di.day < sd.day and di.id = sd.id
)
select day, id, some_day
from matched_days
where row_num = 1
order by id, day
示范.