不确定我是否只是因为这个问题看起来很简单:
+----------+----------+---------------------+
| user_id | country | country_probability |
+----------+----------+---------------------+
| 10000022 | France | 0.126396313 |
| 10000022 | Italy | 0.343407512 |
| 10000022 | England | 0.161236539 |
| 10000044 | China | 0.061884698 |
| 10000044 | S. Korea | 0.043251887 |
| 10000044 | Japan | 0.65095371 |
| 10000046 | USA | 0.215771168 |
| 10000046 | Canada | 0.214556068 |
| 10000046 | Mexico | 0.081350066 |
+----------+----------+---------------------+
在Redshift中,如何对此进行分组以使我的输出为:unique user_id,最大概率的国家/地区以及该user_id的国家/地区的概率?
这将是:
+----------+---------+---------------------+
| user_id | country | country_probability |
+----------+---------+---------------------+
| 10000022 | Italy | 0.343407512 |
| 10000044 | Japan | 0.65095371 |
| 10000046 | USA | 0.215771168 |
+----------+---------+---------------------+
谢谢你,如果这是一个重复的帖子,我很抱歉...我试过搜索但是找不到多少。 Redshift vs MySQL中的分组功能似乎有所不同......
也许是这样的?
select user_id, country, country_probability
from your_table
where (user_id, country_probability) in
(select user_id, max(country_probability)
from test
group by user_id
)
[编辑:另一种选择,使用分析RANK功能]
select user_id, country, country_probability
from (select user_id, country,
country_probability,
rank() over (partition by user_id order by country_probability desc) rnk
from your_table
)
where rnk = 1;
将Littlefoot的方法与以下方法的性能进行比较会很有趣:
select distinct user_id,
first_value(country) over (partition by user_id order by country_probability desc),
min(country_probability) over (partition by user_id)
from t;
我一般不喜欢使用select distinct
进行聚合,但Redshift只支持first_value()
作为窗口函数。