我有一个 Clickhouse 表,其中包含多个数组列,例如
timestamp sensor_type priority values
10:00:00 ['a','b','b','a','c','a','c'] [3, 2, 1, 5, 1, 2, 1] [7, 4, 1, 12, 3, 9, 2]
10:01:00 ['c','e','g','e','g'] [2, 4, 1, 2, 4] [23, 3, 5, 8, 6]
...
时间戳是唯一且单调递增的。记录值的传感器在每个时间戳处动态变化。我正在尝试按每个时间戳的
values
或 sensor_type
对 priority
数组进行分组和求和,因此预期的聚合列如下:
timestamp sensor_type_sorted sum_val_by_sensor_type priority_sorted sum_val_by_priority
10:00:00 ['a', 'b', 'c'] [28, 5, 5] [1, 2, 3, 5] [6, 13, 7, 12]
10:01:00 ['c', 'e', 'g'] [23, 11, 11] [1, 2, 4] [5, 31, 9]
...
如何实现这一目标?
首先将“传感器类型和值”列的值转换为行。然后计算总和,对“id, type”进行“group by”,然后再次将行转换为列。 查询1:
select id, collect_list(typ) as sensor_type_sorted, collect_list(val) as sum_val_by_sensor_type from (
select id, typ, sum(val) as val from (
select s.id, s_type.typ, s_value.val
from sensor s
LATERAL VIEW POSEXPLODE(s.type) s_type as seqt, typ
LATERAL VIEW POSEXPLODE(s.value) s_value as seqv, val
where seqt=seqv) temp1
group by id, typ
order by id, typ) calcultd
group by id;
对“优先级和值”列执行相同操作以创建查询2。在 id 上连接 query1 和 query2。