SQL-Vertica:如何使用最近的日期数据生成每日行

问题描述 投票:0回答:1

我有如下基本表:

score_upd (Upd_dt,Url,Score) AS (
          SELECT DATE '2019-07-26','A','x'
UNION ALL SELECT DATE '2019-07-26','B','alpha'
UNION ALL SELECT DATE '2019-08-01','A','y'
UNION ALL SELECT DATE '2019-08-01','B','beta'
UNION ALL SELECT DATE '2019-08-03','A','z'
UNION ALL SELECT DATE '2019-08-03','B','gamma'
)

   Upd_dt       URL    Score
 2019-07-26      A       x
 2019-07-26      B      alpha 
 2019-08-01      A       y
 2019-08-01      B      beta
 2019-08-03      A       z
 2019-08-03      B      gamma

而且我想以每日URL级别创建一个表,并使用新行的大多数以前日期的值,结果应如下所示:

score_upd (Upd_dt,Url,Score) AS (
          SELECT DATE '2019-07-26','A','x'
UNION ALL SELECT DATE '2019-07-26','B','alpha'
UNION ALL SELECT DATE '2019-07-27','A','x'
UNION ALL SELECT DATE '2019-07-27','B','alpha'
UNION ALL SELECT DATE '2019-07-28','A','x'
UNION ALL SELECT DATE '2019-07-28','B','alpha'
UNION ALL SELECT DATE '2019-07-29','A','x'
UNION ALL SELECT DATE '2019-07-29','B','alpha'
UNION ALL SELECT DATE '2019-07-30','A','x'
UNION ALL SELECT DATE '2019-07-30','B','alpha'
UNION ALL SELECT DATE '2019-07-31','A','x'
UNION ALL SELECT DATE '2019-07-31','B','alpha'
UNION ALL SELECT DATE '2019-08-01','A','y'
UNION ALL SELECT DATE '2019-08-01','B','beta'
UNION ALL SELECT DATE '2019-08-02','A','y'
UNION ALL SELECT DATE '2019-08-02','B','beta'
UNION ALL SELECT DATE '2019-08-03','A','z'
UNION ALL SELECT DATE '2019-08-03','B','gamma'
UNION ALL SELECT DATE '2019-08-04','A','z'
UNION ALL SELECT DATE '2019-08-04','B','gamma'
UNION ALL SELECT DATE '2019-08-05','A','z'
UNION ALL SELECT DATE '2019-08-05','B','gamma'
) 

看起来像:

   Upd_dt       URL    Score 
 2019-07-26      A       x
 2019-07-26      B      alpha 
 2019-07-27      A       x
 2019-07-27      B      alpha 
 2019-07-28      A       x
 2019-07-28      B      alpha 
 2019-07-29      A       x
 2019-07-29      B      alpha 
 2019-07-30      A       x
 2019-07-30      B      alpha 
 2019-07-31      A       x
 2019-07-31      B      alpha 
 2019-08-01      A       y
 2019-08-01      B      beta
 2019-08-02      A       y
 2019-08-02      B      beta
 2019-08-03      A       z
 2019-08-03      B      gamma
 2019-08-04      A       z
 2019-08-04      B      gamma
 2019-08-05      A       z
 2019-08-05      B      gamma
.
.
.

当前过程是:自2019年7月26日至今,我通过以下方式建立了每日维度表:

/ *SELECT CAST(slice_time AS DATE)日期从testcalendar mtcTIMESERIES slice_time为“ 1天”OVER(按CAST排序(mtc.dates为TIMESTAMP));* /

所以我得到:

日期

2019-07-26

2019-07-27

2019-07-28

2019-07-29

2019-10-12(今天)

[我正在考虑是否可以使用“插入先前值”之类的功能按日期联接我的第一个表,通过使用大多数先前日期数据中的值来生成丢失的日期,而它却失败了。

结果未生成缺少日期的行。

[请让我知道是否有人对此有更好的主意。

谢谢!

sql vertica dailybuilds
1个回答
0
投票

作为警告:仅在确实需要时存储“每日照片”。在过去,由于值每年仅更改一次,所以我以前每年最多只能容纳364行。在Vertica中,这需要许可证,加入和分组的CPU和时钟时间...

但是,剩下的-好开始。

但是您无需构建日历也可以应用TIMESERIES。

技巧是手动“外推”您可以自动INTERPOLATE的内容。

添加一个内联的'padding'表,该表包含每个URL的最新值,但使用Vertica特有的[[analytic limit子句 CURRENT_DATE),而不是给它提供LIMIT 1 OVER(PARTITION BY url ORDER BY upd_dt DESC)而不是最新的实际日期。

将您的输入与该填充表进行联合选择,并将TIMESERIES子句应用于该联合选择。

例如:

WITH -- your input ... score_upd (Upd_dt,Url,Score) AS ( SELECT DATE '2019-07-26','A','x' UNION ALL SELECT DATE '2019-07-26','B','alpha' UNION ALL SELECT DATE '2019-08-01','A','y' UNION ALL SELECT DATE '2019-08-01','B','beta' UNION ALL SELECT DATE '2019-08-03','A','z' UNION ALL SELECT DATE '2019-08-03','B','gamma' ) -- real WITH clause would start here ... , -- newest row per Url, just with current date pad_newest AS ( SELECT CURRENT_DATE , url , score FROM score_upd LIMIT 1 OVER(PARTITION BY url ORDER BY upd_dt DESC) ) , with_newest AS ( SELECT * FROM score_upd UNION ALL SELECT * FROM pad_newest ) SELECT ts_dt::DATE AS upd_dt , url AS url , TS_FIRST_VALUE(score) AS score FROM with_newest TIMESERIES ts_dt AS '1 day' OVER ( PARTITION BY url ORDER BY upd_dt::TIMESTAMP ) ORDER BY 1,2 ;

© www.soinside.com 2019 - 2024. All rights reserved.