如何按多列分组并聚合redshift中的最后一列

问题描述 投票:0回答:2

不确定我是否只是因为这个问题看起来很简单:

+----------+----------+---------------------+
| user_id  | country  | country_probability |
+----------+----------+---------------------+
| 10000022 | France   | 0.126396313         |
| 10000022 | Italy    | 0.343407512         |
| 10000022 | England  | 0.161236539         |
| 10000044 | China    | 0.061884698         |
| 10000044 | S. Korea | 0.043251887         |
| 10000044 | Japan    | 0.65095371          |
| 10000046 | USA      | 0.215771168         |
| 10000046 | Canada   | 0.214556068         |
| 10000046 | Mexico   | 0.081350066         |
+----------+----------+---------------------+

在Redshift中,如何对此进行分组以使我的输出为:unique user_id,最大概率的国家/地区以及该user_id的国家/地区的概率?

这将是:

+----------+---------+---------------------+
| user_id  | country | country_probability |
+----------+---------+---------------------+
| 10000022 | Italy   | 0.343407512         |
| 10000044 | Japan   | 0.65095371          |
| 10000046 | USA     | 0.215771168         |
+----------+---------+---------------------+

谢谢你,如果这是一个重复的帖子,我很抱歉...我试过搜索但是找不到多少。 Redshift vs MySQL中的分组功能似乎有所不同......

sql amazon-redshift
2个回答
1
投票

也许是这样的?

select user_id, country, country_probability
from your_table
where (user_id, country_probability) in 
      (select user_id, max(country_probability)
       from test
       group by user_id
      )

[编辑:另一种选择,使用分析RANK功能]

select user_id, country, country_probability
from (select user_id, country, 
        country_probability,
        rank() over (partition by user_id order by country_probability desc) rnk
        from your_table
     )
where rnk = 1; 

0
投票

将Littlefoot的方法与以下方法的性能进行比较会很有趣:

select distinct user_id,
       first_value(country) over (partition by user_id order by country_probability desc),
       min(country_probability) over (partition by user_id)
from t;

我一般不喜欢使用select distinct进行聚合,但Redshift只支持first_value()作为窗口函数。

© www.soinside.com 2019 - 2024. All rights reserved.