我在PostgreSQL 10.5中有一个表订阅:
id user_id starts_at ends_at
--------------------------------
1 233 02/04/19 03/03/19
2 233 03/04/19 04/03/19
3 296 02/09/19 03/08/19
4 126 02/01/19 02/28/19
5 126 03/01/19 03/31/19
6 922 02/22/19 03/22/19
对于每个星期,我想计算一下我们有多少新用户。新订户将是在该周之前没有订阅条目的任何用户ID。
编辑我稍微修改了@fubar解决方案,以适应我更喜欢的日期格式。我忘记在这里添加一个澄清,是否我想看到几周有0
。如何将generate_series
集成到下面的查询中,以便我可以看到0
订阅者的周数?
SELECT TO_CHAR(date_trunc('week', s.starts_at), 'YYYY-MM-DD') as week, COUNT(*) AS count
FROM subscriptions s
LEFT JOIN subscriptions s1 ON s.user_id = s1.user_id AND s.starts_at > s1.starts_at
WHERE s1.id IS NULL
GROUP BY week
ORDER BY week desc
您可以使用以下查询找到每个用户的第一个订阅:
SELECT s.*
FROM subscriptions s
LEFT JOIN subscriptions s1 ON s.user_id = s1.user_id AND s.starts_at > s1.starts_at
WHERE s1.id IS NULL
然后,您可以使用以下查询计算每年/每周的新订户数:
SELECT
EXTRACT(YEAR FROM s.starts_at) AS year,
EXTRACT(WEEK FROM s.starts_at) AS week,
COUNT(*) AS count
FROM subscriptions s
LEFT JOIN subscriptions s1 ON s.user_id = s1.user_id AND s.starts_at > s1.starts_at
WHERE s1.id IS NULL
GROUP BY year, week;
下面是一个更新的查询,它将我上面的答案与generate_series()
和您首选的周日期格式相结合。
SELECT
TO_CHAR(date_trunc('week', w.date), 'YYYY-MM-DD') AS week,
COUNT(DISTINCT s.*) AS count
FROM generate_series('2018-12-31', NOW(), INTERVAL '1 WEEK') w(date)
LEFT JOIN subscriptions s ON s.starts_at BETWEEN w.date AND w.date + INTERVAL '6 DAY'
LEFT JOIN subscriptions s1 ON s.user_id = s1.user_id AND s.starts_at > s1.starts_at
WHERE s1.id IS NULL
GROUP BY w.date;
我为fubar的解决方案+1了。它适用于所有RDBMS。
我将提供另一种方法,它是由于DISTINCT ON
而特定于Postgres的解决方案
查找用户首次订阅的日期:
select
distinct on (s.user_id)
s.*
from subscriptions s
order by s.user_id, s.starts_at;
输出:
| id | user_id | starts_at | ends_at |
| --- | ------- | ------------------------ | ------------------------ |
| 4 | 126 | 2019-02-01T00:00:00.000Z | 2019-02-28T00:00:00.000Z |
| 1 | 233 | 2019-01-04T00:00:00.000Z | 2019-03-03T00:00:00.000Z |
| 3 | 296 | 2019-02-09T00:00:00.000Z | 2019-03-08T00:00:00.000Z |
| 6 | 922 | 2019-02-22T00:00:00.000Z | 2019-03-22T00:00:00.000Z |
架构
CREATE TABLE subscriptions (
id INT NOT NULL,
user_id INT NOT NULL,
starts_at DATE,
ends_at DATE,
PRIMARY KEY(id)
);
INSERT INTO subscriptions VALUES
(1, 233, '2019-01-04', '2019-03-03'),
(2, 233, '2019-03-04', '2019-04-04'),
(3, 296, '2019-02-09', '2019-03-08'),
(4, 126, '2019-02-01', '2019-02-28'),
(5, 126, '2019-03-01', '2019-03-31'),
(6, 922, '2019-02-22', '2019-03-22');
获得每周新订户的数量
现场测试:https://www.db-fiddle.com/f/vhzw4KvANA6Mvi59NDTy3H/0
with first_time
as
(
select
distinct on (s.user_id)
s.*
from subscriptions s
order by s.user_id, s.starts_at
)
select gs.wk, count(ft.*) as new_subscribers_for_the_week
from
generate_series('2019-02-25'::date, now()::date, interval '1 week') gs(wk)
left join first_time ft
on gs.wk >= ft.starts_at and gs.wk <= ft.ends_at
group by gs.wk
order by gs.wk;
输出:
| wk | new_subscribers_for_the_week |
| ------------------------ | ---------------------------- |
| 2019-02-25T00:00:00.000Z | 4 |
| 2019-03-04T00:00:00.000Z | 2 |
| 2019-03-11T00:00:00.000Z | 1 |
| 2019-03-18T00:00:00.000Z | 1 |
| 2019-03-25T00:00:00.000Z | 0 |
| 2019-04-01T00:00:00.000Z | 0 |
| 2019-04-08T00:00:00.000Z | 0 |