保持Hive收集中记录的顺序

Question

我有一个

HIVE

表如下：

select id, id_2, val from test order by id;

234 974 0.5
234 457 0.7
234 236 0.5
234 859 0.6
123 859 0.7
123 236 0.6
123 974 0.5
123 457 0.5

我正在尝试根据

collect

值来

id

数据。我需要收集的数据每行都遵循相同的顺序。我的预期输出如下：（任何顺序都可以，只要所有行都相同）：

234 [974,457,236,859]   [0.5,0.7,0.5,0.6]
123 [974,457,236,859]   [0.5,0.5,0.6,0.7]

我使用了

Brickhouse

的 collect UDF。

select tmp.id, collect(id_2), collect(tmp.val) from
(select id, id_2, val from test
order by id) tmp
group by tmp.id
;

234 [974,457,236,859]   [0.5,0.7,0.5,0.6]
123 [859,236,974,457]   [0.7,0.6,0.5,0.5]

如您所见，列的顺序没有保持。有什么方法可以在整个输出中保持顺序不变吗？任何提示将不胜感激。

Answer 1

使用此查询

select tmp.id, collect(id_2), collect(tmp.val) from
(select id, id_2, val from test
order by id desc, id_2 desc) tmp
group by tmp.id
;

输出如下，

234 [974,457,236,859]   [0.5,0.7,0.5,0.6]
123 [974,457,236,859]   [0.5,0.5,0.6,0.7]

基本修改了

order by id

到

   order by id desc, id_2 desc

Answer 2

请注意 SQL（以及 Hive）中的 SUBQUERY 或 CTE（公用表表达式）不会保留数据顺序。

此查询（仅使用标准配置单元功能）：

with
cte_data_test as (
    select 234 as id, 974 as id_2, 0.5 as val union all
    select 234 as id, 457 as id_2, 0.7 as val union all
    select 234 as id, 236 as id_2, 0.5 as val union all
    select 234 as id, 859 as id_2, 0.6 as val union all
    select 123 as id, 859 as id_2, 0.7 as val union all
    select 123 as id, 236 as id_2, 0.6 as val union all
    select 123 as id, 974 as id_2, 0.5 as val union all
    select 123 as id, 457 as id_2, 0.5 as val
    order by rand() -- just to sumulate that CTE don't preserve order
)
select
    id, 
    regexp_replace( -- remove temporary prefix
        concat_ws( -- concat array with separator
            ',',
            sort_array( -- sort on temporary prefix
                collect_list(
                    concat(
                        '<<<',
                         lpad(id_2, 9, '0'), -- add an temporary alphanumerical sortable prefix
                         '>>>',
                         val
                 )
                )
            )
        ),
        '<<<[0-9]{9}>>>',
        ''
    ) as ordered_collect
from
    cte_data_test
group by
    id

将产生：

id	订购_收集
123	0.6,0.5,0.7,0.5
234	0.5,0.7,0.6,0.5

备注：

ordered_collect是一个字符串，需要使用
```
split
```
函数来获取数组
您需要相同的操作来按 id_2 收集 id_2 订单

保持Hive收集中记录的顺序

问题描述投票：0回答：2

2个回答

最新问题

保持Hive收集中记录的顺序

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2