我正在使用 PostgreSQL 编写一个 SQL 查询,它需要对“到达”某个位置的人员进行排名。然而并不是每个人都到达。我使用
rank()
窗口函数来生成到达排名,但在到达时间为空的地方,rank()
聚合函数只是将它们视为在其他人之后到达,而不是返回空排名。我想要发生的是,这些缺席者获得 NULL
的排名,而不是这个估算的排名。
这是一个例子。假设我有一张表
dinner_show_up
,如下所示:
| Person | arrival_time | Restaurant |
+--------+--------------+------------+
| Dave | 7 | in_and_out |
| Mike | 2 | in_and_out |
| Bob | NULL | in_and_out |
鲍勃从未出现。我正在编写的查询是:
select Person,
rank() over (partition by Restaurant order by arrival_time asc)
as arrival_rank
from dinner_show_up;
结果将会是
| Person | arrival_rank |
+--------+--------------+
| Dave | 2 |
| Mike | 1 |
| Bob | 3 |
我想要发生的是这样的:
| Person | arrival_rank |
+--------+--------------+
| Dave | 2 |
| Mike | 1 |
| Bob | NULL |
只需在
case
周围使用 rank()
语句:
select Person,
(case when arrival_time is not null
then rank() over (partition by Restaurant order by arrival_time asc)
end) as arrival_rank
from dinner_show_up;
对于所有聚合函数(而不仅仅是Rank())来说,更通用的解决方案是在over()子句中按“arrival_time is not null”进行分区。这将导致所有空的arrival_time行被放入同一组并给予相同的排名,使非空行仅相对于彼此进行排名。
为了提供一个有意义的示例,我模拟了一个 CTE,其行数多于初始问题集。请原谅宽行,但我认为它们更好地对比了不同的技术。
with dinner_show_up("person", "arrival_time", "restaurant") as (values
('Dave' , 7, 'in_and_out')
,('Mike' , 2, 'in_and_out')
,('Bob' , null, 'in_and_out')
,('Peter', 3, 'in_and_out')
,('Jane' , null, 'in_and_out')
,('Merry', 5, 'in_and_out')
,('Sam' , 5, 'in_and_out')
,('Pip' , 9, 'in_and_out')
)
select
person
,case when arrival_time is not null then rank() over ( order by arrival_time) end as arrival_rank_without_partition
,case when arrival_time is not null then rank() over (partition by arrival_time is not null order by arrival_time) end as arrival_rank_with_partition
,case when arrival_time is not null then percent_rank() over ( order by arrival_time) end as arrival_pctrank_without_partition
,case when arrival_time is not null then percent_rank() over (partition by arrival_time is not null order by arrival_time) end as arrival_pctrank_with_partition
from dinner_show_up
此查询对于arrival_rank_with/without_partition给出相同的结果。然而,percent_rank() 的结果确实有所不同:without_partition 是错误的,范围从 0% 到 71.4%,而 with_partition 正确地给出了 pctrank() 范围从 0% 到 100%。
同样的模式也适用于 ntile() 聚合函数。
它的工作原理是将所有空值与非空值分开以进行排名。这可确保 Jane 和 Bob 被排除在 0% 到 100% 的百分位数排名之外。
|person|arrival_rank_without_partition|arrival_rank_with_partition|arrival_pctrank_without_partition|arrival_pctrank_with_partition|
+------+------------------------------+---------------------------+---------------------------------+------------------------------+
|Jane |null |null |null |null |
|Bob |null |null |null |null |
|Mike |1 |1 |0 |0 |
|Peter |2 |2 |0.14 |0.2 |
|Sam |3 |3 |0.28 |0.4 |
|Merry |4 |4 |0.28 |0.4 |
|Dave |5 |5 |0.57 |0.8 |
|Pip |6 |6 |0.71 |1.0 |
select Person,
rank() over (partition by Restaurant order by arrival_time asc)
as arrival_rank
from dinner_show_up
where arrival_time is not null
union
select Person,NULL as arrival_rank
from dinner_show_up
where arrival_time is null;
如果使用 Snowflake SQL(不确定这是否适用于其他 SQL),您可以在窗口函数中嵌入“iff”条件来执行如下操作:
select Person, rank() over (partition by Restaurant order (iff(Person is null, 0, arrival_time) by arrival_time asc) as arrival_rank from dinner_show_up;