按周计算第一次订阅者

问题描述 投票:0回答:2

我在PostgreSQL 10.5中有一个表订阅:

id  user_id  starts_at  ends_at
--------------------------------
1   233      02/04/19   03/03/19
2   233      03/04/19   04/03/19
3   296      02/09/19   03/08/19
4   126      02/01/19   02/28/19
5   126      03/01/19   03/31/19
6   922      02/22/19   03/22/19

对于每个星期,我想计算一下我们有多少新用户。新订户将是在该周之前没有订阅条目的任何用户ID。

编辑我稍微修改了@fubar解决方案,以适应我更喜欢的日期格式。我忘记在这里添加一个澄清,是否我想看到几周有0。如何将generate_series集成到下面的查询中,以便我可以看到0订阅者的周数?

SELECT TO_CHAR(date_trunc('week', s.starts_at), 'YYYY-MM-DD') as week, COUNT(*) AS count
FROM subscriptions s
LEFT JOIN subscriptions s1 ON s.user_id = s1.user_id AND s.starts_at > s1.starts_at
WHERE s1.id IS NULL
GROUP BY week
ORDER BY week desc
sql postgresql
2个回答
3
投票

您可以使用以下查询找到每个用户的第一个订阅:

SELECT s.*
FROM subscriptions s
LEFT JOIN subscriptions s1 ON s.user_id = s1.user_id AND s.starts_at > s1.starts_at
WHERE s1.id IS NULL

然后,您可以使用以下查询计算每年/每周的新订户数:

SELECT 
    EXTRACT(YEAR FROM s.starts_at) AS year,
    EXTRACT(WEEK FROM s.starts_at) AS week,
    COUNT(*) AS count
FROM subscriptions s
LEFT JOIN subscriptions s1 ON s.user_id = s1.user_id AND s.starts_at > s1.starts_at
WHERE s1.id IS NULL
GROUP BY year, week;

下面是一个更新的查询,它将我上面的答案与generate_series()和您首选的周日期格式相结合。

SELECT 
  TO_CHAR(date_trunc('week', w.date), 'YYYY-MM-DD') AS week, 
  COUNT(DISTINCT s.*) AS count
FROM generate_series('2018-12-31', NOW(), INTERVAL '1 WEEK') w(date)
LEFT JOIN subscriptions s ON s.starts_at BETWEEN w.date AND w.date + INTERVAL '6 DAY'
LEFT JOIN subscriptions s1 ON s.user_id = s1.user_id AND s.starts_at > s1.starts_at
WHERE s1.id IS NULL
GROUP BY w.date;

数据库小提琴:https://www.db-fiddle.com/f/b73AbU3KU6dsfTvXu3mzjz/1


0
投票

我为fubar的解决方案+1了。它适用于所有RDBMS。

我将提供另一种方法,它是由于DISTINCT ON而特定于Postgres的解决方案

查找用户首次订阅的日期:

select 
    distinct on (s.user_id)

    s.*

from subscriptions s
order by s.user_id, s.starts_at;

输出:

| id  | user_id | starts_at                | ends_at                  |
| --- | ------- | ------------------------ | ------------------------ |
| 4   | 126     | 2019-02-01T00:00:00.000Z | 2019-02-28T00:00:00.000Z |
| 1   | 233     | 2019-01-04T00:00:00.000Z | 2019-03-03T00:00:00.000Z |
| 3   | 296     | 2019-02-09T00:00:00.000Z | 2019-03-08T00:00:00.000Z |
| 6   | 922     | 2019-02-22T00:00:00.000Z | 2019-03-22T00:00:00.000Z |

架构

CREATE TABLE subscriptions (
  id INT NOT NULL,
  user_id INT NOT NULL,
  starts_at DATE,
  ends_at DATE,
  PRIMARY KEY(id)
);

INSERT INTO subscriptions VALUES
  (1, 233, '2019-01-04', '2019-03-03'),
  (2, 233, '2019-03-04', '2019-04-04'),
  (3, 296, '2019-02-09', '2019-03-08'),
  (4, 126, '2019-02-01', '2019-02-28'),
  (5, 126, '2019-03-01', '2019-03-31'),
  (6, 922, '2019-02-22', '2019-03-22');

获得每周新订户的数量

现场测试:https://www.db-fiddle.com/f/vhzw4KvANA6Mvi59NDTy3H/0

with first_time
as
(
    select 
        distinct on (s.user_id)

        s.*

    from subscriptions s
    order by s.user_id, s.starts_at
)
select gs.wk, count(ft.*) as new_subscribers_for_the_week
from 
    generate_series('2019-02-25'::date, now()::date, interval '1 week') gs(wk)
left join first_time ft 
    on gs.wk >= ft.starts_at and gs.wk <= ft.ends_at

group by gs.wk
order by gs.wk;

输出:

| wk                       | new_subscribers_for_the_week |
| ------------------------ | ---------------------------- |
| 2019-02-25T00:00:00.000Z | 4                            |
| 2019-03-04T00:00:00.000Z | 2                            |
| 2019-03-11T00:00:00.000Z | 1                            |
| 2019-03-18T00:00:00.000Z | 1                            |
| 2019-03-25T00:00:00.000Z | 0                            |
| 2019-04-01T00:00:00.000Z | 0                            |
| 2019-04-08T00:00:00.000Z | 0                            |
© www.soinside.com 2019 - 2024. All rights reserved.