我有一个包含用户活动的日志表。我正在尝试创建一个显示唯一用户条目和新用户条目的查询。
样本数据:
| uid | act | tm |
| --- | --- | ------------------------ |
| 1 | l | 2019-01-02T00:00:00.000Z |
| 1 | l | 2019-01-05T00:00:00.000Z |
| 2 | l | 2019-02-02T00:00:00.000Z |
| 1 | l | 2019-02-03T00:00:00.000Z |
| 2 | l | 2019-02-04T00:00:00.000Z |
| 3 | l | 2019-02-05T00:00:00.000Z |
| 1 | l | 2019-03-02T00:00:00.000Z |
| 2 | l | 2019-03-02T00:00:00.000Z |
| 3 | l | 2019-03-02T00:00:00.000Z |
| 4 | l | 2019-03-02T00:00:00.000Z |
第一部分很简单:count(distinct(uid)) as tot_users
但有没有办法做第二部分 - 那个时期出现但不是之前的用户...
这是我到目前为止所得到的 - https://www.db-fiddle.com/f/8EXsih1VAL1iWXKeauPQiB/1
为了将来参考,我更新了db-fiddle以及2个提议的解决方案。两者都很好用:
https://www.db-fiddle.com/f/8EXsih1VAL1iWXKeauPQiB/6
SELECT
to_char( date_trunc('month', tm), 'YYYY-MM') as mnth,
count(uid) as tot_entries,
COUNT(DISTINCT uid) as tot_users,
COUNT(DISTINCT
CASE
WHEN DATE_TRUNC('month', min_tm) = DATE_TRUNC('month', tm)
THEN uid
END) AS new_users
FROM (SELECT l.*, MIN(tm) OVER(PARTITION BY uid) min_tm FROM logs l) x
GROUP BY mnth
ORDER BY mnth;
SELECT
to_char(date_trunc('month', l1.tm), 'YYYY-MM') mnth,
count(l1.uid) tot_entries,
count(DISTINCT l1.uid) tot_users,
count(DISTINCT
CASE
WHEN NOT EXISTS (SELECT *
FROM logs l2
WHERE l2.uid = l1.uid
AND to_char(date_trunc('month', l2.tm), 'YYYY-MM') < to_char(date_trunc('month', l1.tm), 'YYYY-MM'))
THEN
l1.uid
END) new_users
FROM logs l1
GROUP BY mnth
ORDER BY mnth;
您可以在子查询中使用窗口函数来计算每个用户的第一个日志条目的时间戳,例如:
SELECT l.*, MIN(tm) OVER(PARTITION BY uid) min_tm FROM logs l
然后,您可以在外部查询中分析结果。当用户的第一个日志条目的日期属于analyzis间隔时,您可以将其视为新用户。
假设参数:start_tm
和:end_tm
代表分析期的开始和结束,你会去:
SELECT
COUNT(DISTINCT uid) as tot_users,
COUNT(DISTINCT CASE WHEN min_tm >= :start_tm AND min_tm < :end_tm THEN uid END) AS tot_new_users
FROM (SELECT l.*, MIN(tm) OVER(PARTITION BY uid) min_tm FROM logs l) x
WHERE tm >= :start_tm AND tm < :end_tm
如果您需要按月汇总:
SELECT
DATE_TRUNC('month', tm) AS my_month,
COUNT(DISTINCT uid) as tot_users,
COUNT(DISTINCT CASE WHEN DATE_TRUNC('month', min_tm) = DATE_TRUNC('month', tm) THEN uid END) AS tot_new_users
FROM (SELECT l.*, MIN(tm) OVER(PARTITION BY uid) min_tm FROM logs l) x
GROUP BY my_month
ORDER BY my_month
您可以使用条件聚合。在CASE
表达式中检查上个月是否存在同一用户的日志条目。除非您发现此类条目返回用户的ID。使用该表达式作为count()
的参数。
SELECT to_char(date_trunc('month', l1.tm), 'YYYY-MM') mnth,
count(l1.uid) tot_entries,
count(DISTINCT l1.uid) tot_users,
count(DISTINCT CASE
WHEN NOT EXISTS (SELECT *
FROM logs l2
WHERE l2.uid = l1.uid
AND to_char(date_trunc('month', l2.tm), 'YYYY-MM') < to_char(date_trunc('month', l1.tm), 'YYYY-MM')) THEN
l1.uid
END) new_users
FROM logs l1
GROUP BY mnth
ORDER BY mnth;
您可以使用having子句或自联接。你提到了一段时间,所以我不确定确切的过滤器,但我们假设这是一个简单的例子,你可以做这样的事情
select
uid,
case when mintm<'2019-03-02T00:00:00.000Z' --cutoff
then 'old' else 'new'
end flag
from (
select uid, min(tm) mintm from table
group by uid ) as first_logins