我有一张包含学生数据的表格,其中包括他们的开始日期和退学日期。并非所有学生都有退学日期,因为他们仍在就读。 我正在尝试计算每月新生人数、每月提款人数、每月现有学生人数。我已经完成了每月的新学生和提款统计,但我在计算现有学生时遇到了麻烦。
数据从2023年1月开始。
这是我提出的查询:
WITH student_activity AS
(
-- Convert start and withdrawal date keys to actual date format
SELECT to_date(fe.start_date_key::text, 'YYYYMMDD') AS start_date,
to_date(fe.withdrawal_date_key::text, 'YYYYMMDD') AS withdrawal_date,
dp.product_name, dp.sku
FROM fact_enrolment fe
INNER JOIN dim_product dp ON fe.product_key = dp.product_key
)
SELECT date_trunc('month', month_series) AS month,
COUNT(*) AS existing_students,
sa.product_name
FROM (
SELECT generate_series(
(SELECT MIN(to_date(start_date_key::text, 'YYYYMMDD')) FROM fact_enrolment),'2100-12-31',INTERVAL '1 month') AS month_series) AS months
LEFT JOIN student_activity sa ON sa.start_date < month_series AND (sa.withdrawal_date IS NULL OR sa.withdrawal_date >= month_series)
GROUP BY month, sa.product_name
此查询确实正确获取了现有计数,但我面临一个问题,如果某个月没有学生开始,那么该月根本不会显示在结果集中。我需要它仍然显示每个月,如果它没有任何新值,那么它应该使用上个月的计数,因为这些学生仍在注册。
CREATE TABLE fact_enrolment (
student_key int4 NULL,
start_date_key int4 NULL,
withdrawal_date_key int4 NULL,
product_key int4 NULL);
CREATE TABLE dim_product (
product_key int4 GENERATED ALWAYS AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 2147483647 START 1 CACHE 1 NO CYCLE) NOT NULL,
product_name varchar NULL,
sku varchar NULL);
INSERT INTO dim_product (product_name , sku) VALUES ('Preschool' , 'ABC123');
INSERT INTO fact_enrolment (student_key, start_date_key, withdrawal_date_key, product_key) VALUES
(12, 20230105, 20230130, 1)
, (14, 20230106, 20230120, 1)
, (45, 20230405, 20230420, 1);
INSERT INTO fact_enrolment (student_key, start_date_key, product_key) VALUES
(17, 20230110, 1)
, (20, 20230120, 1)
, (21, 20230220, 1)
, (22, 20230202, 1)
, (23, 20230228, 1)
, (34, 20230206, 1)
, (44, 20230406, 1);
根据给定的样本进行计算:
月 | 添加 | 提款 | 现有学生 |
---|---|---|---|
2023-01 | 4 | 2 | 4 |
2023-02 | 4 | 0 | 6 |
2023-03 | 0 | 0 | 6 |
2023-04 | 2 | 1 | 8 |
假设上表中的提款发生在月底,则在下个月的计数中减去。
在示例中,请注意 2023-03 仍显示上个月的计数。
我使用的是 Postgres 版本 13.15。
filter
子句 来 count(*)
不同的内容。 窗口函数然后让您运行这些计数的两个步进总和并减去它们以获得当前余额。WITH student_activity AS (
SELECT date_trunc('month',to_date(fe.start_date_key::text, 'YYYYMMDD'))::date AS start_month
, date_trunc('month',to_date(fe.withdrawal_date_key::text, 'YYYYMMDD'))::date AS withdrawal_month
, dp.product_name
, dp.sku
FROM fact_enrolment AS fe
JOIN dim_product AS dp
ON fe.product_key = dp.product_key )
,calendar as (
select month::date
from (select min(start_month) as earliest_month
, max(withdrawal_month) as latest_month
from student_activity) as limits
cross join lateral generate_series( earliest_month
,latest_month
,'1 month'::interval) as month)
,monthlies as (
select month
, count(*)filter(where month=start_month) as enrollments
, count(*)filter(where month=withdrawal_month) as withdrawals
from calendar
left join student_activity as fe
on month=any(array[start_month,withdrawal_month])
group by month)
select*, sum(enrollments)over w1
-sum(withdrawals)over w1 as "existing students"
from monthlies
window w1 as(order by month)
order by month;
我不知道为什么在 10 名学生入学和 3 名退学后,你期望剩下 8 名学生。那就是 7:
月 | 报名人数 | 提款 | 现有学生 |
---|---|---|---|
2023-01-01 | 4 | 2 | 2 |
2023-02-01 | 4 | 0 | 6 |
2023-03-01 | 0 | 0 | 6 |
2023-04-01 | 2 | 1 | 7 |