保持Hive收集中记录的顺序

问题描述 投票:0回答:2

我有一个

HIVE
表如下:

select id, id_2, val from test order by id;

234 974 0.5
234 457 0.7
234 236 0.5
234 859 0.6
123 859 0.7
123 236 0.6
123 974 0.5
123 457 0.5

我正在尝试根据

collect
值来
id
数据。我需要收集的数据每行都遵循相同的顺序。我的预期输出如下:(任何顺序都可以,只要所有行都相同):

234 [974,457,236,859]   [0.5,0.7,0.5,0.6]
123 [974,457,236,859]   [0.5,0.5,0.6,0.7]

我使用了

Brickhouse
collect UDF。

select tmp.id, collect(id_2), collect(tmp.val) from
(select id, id_2, val from test
order by id) tmp
group by tmp.id
;

234 [974,457,236,859]   [0.5,0.7,0.5,0.6]
123 [859,236,974,457]   [0.7,0.6,0.5,0.5]

如您所见,列的顺序没有保持。有什么方法可以在整个输出中保持顺序不变吗?任何提示将不胜感激。

hive hiveql
2个回答
2
投票

使用此查询

select tmp.id, collect(id_2), collect(tmp.val) from
(select id, id_2, val from test
order by id desc, id_2 desc) tmp
group by tmp.id
;

输出如下,

234 [974,457,236,859]   [0.5,0.7,0.5,0.6]
123 [974,457,236,859]   [0.5,0.5,0.6,0.7]

基本修改了

order by id

   order by id desc, id_2 desc

0
投票

请注意 SQL(以及 Hive)中的 SUBQUERY 或 CTE(公用表表达式)不会保留数据顺序。

此查询(仅使用标准配置单元功能):

with
cte_data_test as (
    select 234 as id, 974 as id_2, 0.5 as val union all
    select 234 as id, 457 as id_2, 0.7 as val union all
    select 234 as id, 236 as id_2, 0.5 as val union all
    select 234 as id, 859 as id_2, 0.6 as val union all
    select 123 as id, 859 as id_2, 0.7 as val union all
    select 123 as id, 236 as id_2, 0.6 as val union all
    select 123 as id, 974 as id_2, 0.5 as val union all
    select 123 as id, 457 as id_2, 0.5 as val
    order by rand() -- just to sumulate that CTE don't preserve order
)
select
    id, 
    regexp_replace( -- remove temporary prefix
        concat_ws( -- concat array with separator
            ',',
            sort_array( -- sort on temporary prefix
                collect_list(
                    concat(
                        '<<<',
                         lpad(id_2, 9, '0'), -- add an temporary alphanumerical sortable prefix
                         '>>>',
                         val
                 )
                )
            )
        ),
        '<<<[0-9]{9}>>>',
        ''
    ) as ordered_collect
from
    cte_data_test
group by
    id

将产生:

id 订购_收集
123 0.6,0.5,0.7,0.5
234 0.5,0.7,0.6,0.5

备注:

  • ordered_collect是一个字符串,需要使用
    split
    函数来获取数组
  • 您需要相同的操作来按 id_2 收集 id_2 订单
© www.soinside.com 2019 - 2024. All rights reserved.