如何仅选择子组中每列值最大计数的行?

问题描述 投票:0回答:2

tmch
包含数千行,它是:

CREATE TABLE IF NOT EXISTS public.tmch (
    id bigserial NOT NULL,
    year integer,
    week integer,
    my_number integer,
    device_id bigint,
    CONSTRAINT tmch_pkey PRIMARY KEY (id)
)

样本数据:

1716446 2024    37  13  2
1716447 2024    37  13  2
1716448 2024    37  0   3
1716449 2024    37  11  4
1716450 2024    37  12  4
1716451 2024    37  0   6
1716452 2024    37  0   6
1716453 2024    37  0   6
1716454 2024    37  1   6
1716455 2024    37  1   6
1716456 2024    37  9   7

这里有一个查询,用于计算每组

my_number
出现了多少次
(week, year, device_id)

select count(my_number) c, my_number, device_id, year, week from tmch
group by my_number, device_id, year, week
order by device_id asc, c desc

此查询的结果 - 基于更大、不同的样本(以避免不清楚):

6   16  2   2024    37
4   17  2   2024    37
4   15  2   2024    37
4   0   2   2024    37
3   11  2   2024    37
3   14  2   2024    37
2   13  2   2024    37
2   1   2   2024    37
2   18  2   2024    37
2   12  2   2024    37
1   10  2   2024    37
1   2   2   2024    37
8   15  3   2024    37
6   16  3   2024    37
5   14  3   2024    37
4   17  3   2024    37
4   12  3   2024    37
3   7   3   2024    37
3   20  3   2024    37  
3   18  3   2024    37
3   19  3   2024    37
3   4   3   2024    37
3   5   3   2024    37
3   6   3   2024    37
1   21  3   2024    37
1   0   3   2024    37
1   3   3   2024    37
1   8   3   2024    37

如何仅获取每个

my_number
计数最高的行 一群
(week, year, device_id)

以上示例的结果:

6   16  2   2024    37  -- because my_number=16 occurs 6 times for device_id=2 y=2024 w=37
8   15  3   2024    37  -- because my_number=15 occurs 8 times for device_id=3 y=2024 w=37

我尝试过

row_number() over (partition by ...)
但没有成功。

sql postgresql aggregate greatest-n-per-group
2个回答
2
投票
如果您将查询用作子查询,

DISTINCT ON
会执行您想要的操作:

SELECT DISTINCT ON (device_id, week, year)
       c, my_number, device_id, year, week
FROM (SELECT count(my_number) AS c,
             my_number,
             device_id,
             year,
             week
      FROM tmch
      GROUP BY my_number, device_id, year, week) AS sub
ORDER BY device_id, week, year, c DESC;

0
投票

SELECT
查询中,
DISTINCT ON
在使用 GROUP BY 和聚合函数进行聚合之后应用。 (即使在窗口函数之后。)因此您可以在单个查询级别中完成所有操作,而无需子查询。
既然您在 

my_number

子句中列出了

GROUP BY
,那么使用
count(*)
而不是
count(my_number)
更有意义。也快一点。
逻辑上唯一的区别:如果 
my_number
可以为空,您也可以获得该组的实际计数。否则,空组的计数将为 0。
SELECT DISTINCT ON (device_id, year, week)
       count(*) AS c, my_number, device_id, year, week
FROM   tmch
GROUP  BY device_id, year, week, my_number
ORDER  BY device_id, year, week, c DESC;

具有更直观的列顺序的相同查询:

SELECT DISTINCT ON (device_id, year, week) device_id, year, week, my_number, count(*) AS c FROM tmch GROUP BY device_id, year, week, my_number ORDER BY device_id, year, week, c DESC;

或者使用最少的语法:

SELECT DISTINCT ON (1,2,3) device_id, year, week, my_number, count(*) AS c FROM tmch GROUP BY 1, 2, 3, 4 ORDER BY 1, 2, 3, c DESC;

小提琴

只需确保

ORDER BY

DISTINCT ON
不不一致即可。参见:

    选择每个 GROUP BY 组中的第一行?
  • 在应用 LIMIT 之前获取结果计数的最佳方法
  • 空表计数的意外行为
© www.soinside.com 2019 - 2024. All rights reserved.