在Presto sql中,如何获取表中两个事件之间的最大时间戳

问题描述 投票:0回答:1

我有一个如下表

df
,我的目标是在每个 [user:session] 对每次访问后找到最大的“焦点”时间戳。 (时间戳为 BIGINT 数据类型并按升序排列)

活动 用户 ID session_id 时间戳
参观 a b1 t1
焦点 a b1 t2
焦点 a b1 t3
参观 a b1 t4
焦点 a b1 t5
焦点 a b1 t6
焦点 a b1 t7
参观 a b1 t8

我希望最终的输出可以是这样的

用户 ID session_id 访问时间戳 最大焦点时间戳
a b1 t1 t3
a b1 t4 t7

好奇我怎样才能实现这一目标?

这是我尝试过的,但最后一列(max_focus_timestamp)始终为空,好奇如何解决这个问题?谢谢!

with visit as 
(
select 
  user_id,
  session_id,
  timestamp as visit_timestamp,
  lead(timestamp) IGNORE NULLS over (PARTITION by user_id, session_id ORDER BY timestamp) as next_visit_timestamp 
  
from df 
where event = 'visit' 
),
focus as 
(
select 
  user_id,
  session_id,
  timestamp as focus_timestamp
  
from df 
where event != 'visit' 
)
select 
  distinct 
  v.user_id,
  v.session_id,
  v.visit_timestamp,
  max(f.focus_timestamp) as max_focus_timestamp
  
from visit v left join focus f on v.user_id = f.user_id and v.session_id = f.session_id 
 and f.focus_timestamp between v.visit_timestamp and v.next_visit_timestamp 
 
group by 1,2,3
sql max presto trino
1个回答
0
投票

您可以使用间隙和岛屿方法(或与之非常相似的方法)。您可以根据遇到的

visit
计数,通过窗口函数创建“组”,然后将其分组为最终结果:

-- sample data
with dataset(event, user_id, session_id, timestamp) as(
    values  ('visit', 'a', 'b1', 't1'),
    ('focus', 'a', 'b1', 't2'),
    ('focus', 'a', 'b1', 't3'),
    ('visit', 'a', 'b1', 't4'),
    ('focus', 'a', 'b1', 't5'),
    ('focus', 'a', 'b1', 't6'),
    ('focus', 'a', 'b1', 't7'),
    ('visit', 'a', 'b1', 't8')
)

-- query
select user_id,
       session_id,
       min(timestamp) visit_timestamp,
       max(timestamp) max_focus_timestamp
from (select *,
           sum(if(event = 'visit', 1)) over (partition by user_id, session_id order by timestamp) gr
    from dataset)
group by user_id, session_id, gr
having count(*) > 1;

输出:

用户 ID session_id 访问时间戳 最大焦点时间戳
a b1 t1 t3
a b1 t4 t7
最新问题
© www.soinside.com 2019 - 2025. All rights reserved.