我试图通过时间序列获得Redshift中不同对象的累积计数。直截了当的是使用COUNT(DISTINCT myfield)OVER(ORDER BY timefield DESC ROWS UNBOUNDED PRECEDING),但Redshift给出了“不支持窗口定义”错误。
例如,下面的代码试图找到从第一周到现在的每周累积的不同用户。但是,我得到“不支持窗口功能”错误。
SELECT user_time.weeks_ago,
COUNT(distinct user_time.user_id) OVER
(ORDER BY weeks_ago desc ROWS UNBOUNDED PRECEDING) as count
FROM (SELECT FLOOR(EXTRACT(DAY FROM sysdate - ev.time) / 7) AS weeks_ago,
ev.user_id as user_id
FROM events as ev
WHERE ev.action='some_user_action') as user_time
目标是构建执行操作的唯一用户的累积时间序列。关于如何做到这一点的任何想法?
想出答案。结果证明是一组嵌套的子查询,内部子查询计算每个用户第一次操作的时间。中间子查询计算每个时间段的总操作数,最终外部查询执行时间序列的累计总和:
(SELECT engaged_per_week.week as week,
SUM(engaged_per_week.total) over (order by engaged_per_week.week DESC ROWS UNBOUNDED PRECEDING) as total
FROM
-- COUNT OF FIRST TIME ENGAGEMENTS PER WEEK
(SELECT engaged.first_week AS week,
count(engaged.first_week) AS total
FROM
-- WEEK OF FIRST ENGAGEMENT FOR EACH USER
(SELECT MAX(FLOOR(EXTRACT(DAY FROM sysdate - ev.time) / 7)) as first_week
FROM events ev
WHERE ev.name='some_user_action'
GROUP BY ev.user_id) AS engaged
GROUP BY week) as engaged_per_week
ORDER BY week DESC) as cumulative_engaged
以下是如何将它应用于引用here的示例,另外我添加了另一行复制'2015-01-01'的'table'来演示它如何计算区别。
该示例的作者对解决方案是错误的,但我只是使用他的示例。
create table public.test
(
"date" date,
item varchar(8),
measure int
)
insert into public.test
values
('2015-01-01', 'table', 12),
('2015-01-01', 'table', 120),
('2015-01-01', 'chair', 51),
('2015-01-01', 'lamp', 8),
('2015-01-02', 'table', 17),
('2015-01-02', 'chair', 72),
('2015-01-02', 'lamp', 23),
('2015-01-02', 'bed', 1),
('2015-01-02', 'dresser', 2),
('2015-01-03', 'bed', 1);
WITH x AS (
SELECT
*,
DENSE_RANK()
OVER (PARTITION BY date
ORDER BY item) AS dense_rank
FROM public.test
)
SELECT
"date",
item,
measure,
max(dense_rank)
OVER (PARTITION BY "date")
FROM x
ORDER BY 1;
子查询为您提供每个日期的每个项目的密集排名,然后主查询获得每个日期的密集排名的最大值,即每个日期的项目的不同计数。
你需要密集的等级而不是直的等级来计算差别。
您应该使用DENSE_RANK而不是count(distinct):
DENSE_RANK() OVER(PARTITION BY weeks_ago ORDER BY user_time.user_id)
当你在这样的总和中使用count distinct时,它似乎正在工作:
SELECT user_time.weeks_ago,
SUM(COUNT(distinct user_time.user_id)) OVER
(ORDER BY weeks_ago desc ROWS UNBOUNDED PRECEDING) as test
FROM (SELECT FLOOR(EXTRACT(DAY FROM sysdate - ev.time) / 7) AS weeks_ago
,ev.user_id as user_id
FROM events as ev
WHERE ev.action='some_user_action'
) user_time
GROUP BY user_time.weeks_ago
我遇到了同样的问题,但是我已经将DENSE_RANK()
和MAX() over(partition by)
应用于下面的代码,希望如果有人仍然在努力解决这个问题,它会有所帮助:
-- IN NZ
select
id,NAME,count(distinct name) OVER (
PARTITION BY id)
from
edw.admin.test;
/*
create table edw.admin.test
as
(
select 1 as id,'Anne' as name,500.0 as amt,'iv' as IID
union ALL
select 1,'Jeni',550.0,'is'
union ALL
select 1,'Arna',250.0,'is'
union ALL
select 2,'Raj',290.0,'is'
union ALL
select 1,'Anne',350.0,'ir'
union ALL
select 1,NULL,350.0,'ir'
union ALL
select 3,NULL,350.0,'ir'
union ALL
select 3,NULL,350.0,'ir');
Output in NZ:
-------------------------
ID NAME COUNT
1 NULL 3
1 Anne 3
1 Anne 3
1 Arna 3
1 Jeni 3
2 Raj 1
3 NULL 0
3 NULL 0
*/
-- IN AWS RS
select id, name, max(DENSE_COUNT) over(partition by id)
from(
select
id,name,CASE WHEN name IS NULL THEN 0 ELSE DENSE_RANK() OVER (
PARTITION BY id
order by name) END AS DENSE_COUNT
from
(
select 1 as id,'Anne' as name,500.0 as amt,'iv' as IID
union ALL
select 1,'Jeni',550.0,'is'
union ALL
select 1,'Arna',250.0,'is'
union ALL
select 2,'Raj',290.0,'is'
union ALL
select 1,'Anne',350.0,'ir'
union ALL
select 1,NULL,350.0,'ir'
union ALL
select 3,NULL,350.0,'ir'
union ALL
select 3,NULL,350.0,'ir'));
/*
Output in RS:
-------------------------
id name max
1 Anne 3
1 Anne 3
1 Arna 3
1 Jeni 3
1 NULL 3
2 Raj 1
3 NULL 0
3 NULL 0
*/