我编写了以下代码来检查过去 30 天内的唯一客户。我如何重新利用此代码来检查每月开始日期的唯一客户。我正在尝试使用具有每日粒度的 table_billing 构建每月聚合。你能指导一下吗?
select 'context.processingDate' as rptg_dt, COALESCE(item_type,'ALL_ITEMS') as item_type, unique_customers
from (select
(case when item_type_code in ('A') then 'Books'
when item_type_code in ('B','C') then 'Toys'
else 'Fruits' end
) as item_type,
count(distinct person_id) as unique_customers
from table_billing
where rptg_dt between cast('context.processingDate' as date format 'YYYY-MM-DD')-30
AND cast('context.processingDate' as date format 'YYYY-MM-DD')
and item_type_code in ('A','B','C','D','E')
group by CUBE(1)
) a;
所需输出:
Monthly Start Date | Item Type | Unique Customers
5/1/14 | Books | 100
5/1/14 | Toys | 80
5/1/14 | Fruits | 25
5/1/14 | ALL_ITEMS | 175
6/1/14 | Books | 80
6/1/14 | Toys | 60
6/1/14 | Fruits | 40
6/1/14 | ALL_ITEMS | 95
我希望按如下方式重写此查询:
select 'context.processingDate' as month_start_dt,
COALESCE(item_type,'ALL_ITEMS') as item_type, unique_customers
from (select
(case when item_type_code in ('A') then 'Books'
when item_type_code in ('B','C') then 'Toys'
else 'Fruits' end
) as item_type,
count(distinct person_id) as unique_customers
from table_billing
where month_start_dt = cast('context.processingDate' as date format'YYYY-MM-DD') and item_type_code in ('A','B','C','D','E') group by CUBE(1)) a;
我将如何调整查询以使其成为可能?谢谢你!
我假设
context.processingDate
是您将参数传递给脚本的方式。对于每月聚合,您希望处理所有可用数据,因此不再需要此参数。
那么答案是:
select monthly_rptg_dt, COALESCE(item_type,'ALL_ITEMS') as item_type, unique_customers
from (select
date_trunc('MONTH', rptg_dt) AS monthly_rptg_dt,
(case when item_type_code in ('A') then 'Books'
when item_type_code in ('B','C') then 'Toys'
else 'Fruits' end
) as item_type,
count(distinct person_id) as unique_customers
from table_billing
where item_type_code in ('A','B','C','D','E')
group by 1, CUBE(2)
) a;
我还假设不存在间隙(没有数据的月份),或者不需要它们。否则,您必须将此数据外部连接到月份列表,或使用
TIMESERIES
。