用重复阈值填充列

问题描述 投票:0回答:1

假设我有两个表

a
b
,每个表包含一系列日期:

a

日_日期
2020-1-1
2020-1-2
2020-1-3
2020-1-4
2020-1-5
2020-1-6
2020-1-7
2020-1-8
2020-1-9
2020-1-10

和表

b
:

id 某个日期
0 2020-1-3
0 2020-1-6
0 2020-1-8
1 2020-1-2
1 2020-1-5

我想创建一个新表

c
,其中包含
day_date
id
some_date
,但现在
some_date
仅包含使用高于某个给定阈值的最小值的值,即:

日_日期 id 下一个日期
2020-1-1 0 2020-1-3
2020-1-2 0 2020-1-3
2020-1-3 0 2020-1-6
2020-1-4 0 2020-1-6
2020-1-5 0 2020-1-6
2020-1-6 0 2020-1-8
2020-1-7 0 2020-1-8
2020-1-8 0
2020-1-9 0
2020-1-10 0
2020-1-1 1 2020-1-2
2020-1-2 1 2020-1-5
2020-1-3 1 2020-1-5
2020-1-4 1 2020-1-5
2020-1-5 1
2020-1-6 1
2020-1-7 1
2020-1-8 1
2020-1-9 1
2020-1-10 1

我的想法是过滤两者的交叉连接,例如:

CREATE TEMP TABLE some_next AS (
  SELECT
    day_date,
    id,
    CASE WHEN some_date > day_date THEN some_date ELSE NULL END AS next_churn_date,
    ROW_NUMBER() OVER (PARTITION BY day_date, id ORDER BY some_date ASC) AS rn_next
  FROM a CROSS JOIN b
  WHERE some_date > day_date OR day_date >= (SELECT MAX(some_date) FROM b bb WHERE bb.id = b.id
);

CREATE TABLE c AS (
  SELECT * FROM some_next WHERE rn_next = 1 ORDER BY id, day_date
);

我一直在寻找一个更简单的解决方案。有什么想法吗?

sql postgresql amazon-redshift
1个回答
0
投票

我们可以从交叉连接开始来获取 ID 和日期的所有组合。

select
  day,
  id
from days
cross join ( select distinct id from some_days )
order by id, day;

然后用它来对 some_days 与这些天的接近程度进行排名(降低顺序,这是不必要的)。

with days_ids as (
  select
    day,
    id
  from days
  cross join ( select distinct id from some_days )
)
select
  di.day as day,
  di.id,
  sd.day as some_day,
  row_number() over (
    partition by di.day, di.id
    order by sd.day asc
  ) as row_num
from days_ids di
left join some_days sd on di.day < sd.day and di.id = sd.id
order by di.id, di.day, row_num

并仅选择 row_number 为 1 的行。

with days_ids as (
  select
    day,
    id
  from days
  cross join ( select distinct id from some_days )
), 
matched_days as (
  select
    di.day as day,
    di.id,
    sd.day as some_day,
    row_number() over (
      partition by di.day, di.id
      order by sd.day asc
    ) as row_num
  from days_ids di
  left join some_days sd on di.day < sd.day and di.id = sd.id
)
select day, id, some_day
from matched_days
where row_num = 1
order by id, day

示范.

© www.soinside.com 2019 - 2024. All rights reserved.