我有一个带有列的表格,其中包含这样的列表:
id
[1,2,3,10]
[1]
[2,3,4,9]
我想要的结果是一个包含未列出值的表,如下所示:
id2
1
2
3
10
1
2
3
4
9
我尝试了在网上找到的不同解决方案,aws文档,SO解决方案,博客文章,但没有任何运气,因为我在列中有一个列表而不是json对象。 如有任何帮助,我们将不胜感激!
更新(2022):Redshift 现在支持数组并允许轻松“取消嵌套”它们。
语法很简单,就是有一个
FROM the_table AS the_table_alias, the_table_alias.the_array AS the_element_alias
这是问题中提到的数据的示例:
WITH
-- some table with test data
input_data as (
SELECT array(1,2,3,10) as id
union all
SELECT array(1) as id
union all
SELECT array(2,3,4,9) as id
)
SELECT
id2
FROM
input_data AS ids,
ids.id AS id2
达到预期效果:
id2
---
1
2
3
4
9
1
2
3
10
请参阅此处了解更多详细信息和具有更深嵌套级别的示例:https://docs.aws.amazon.com/redshift/latest/dg/query-super.html
该列的数据类型是什么?
Redshift 不支持数组,所以让我假设这是一个 JSON 字符串。
Redshift不提供JSON集合返回函数:我们需要手动取消嵌套。如果您有一个具有足够行数的表(至少与数组中的元素一样多的行),这是一种方法 - 说
sometable
:
select json_extract_array_element_text(t.id, n.rn) as new_id
from mytable t
inner join (select row_number() over() - 1 as rn from sometable) n
on n.rn < json_array_length(t.id)
我尝试使用递归 CTE 在数组列上获得相同的结果,这是我得出的结果。我确信有更好的方法,但无论如何,这就是..
WITH recursive unnested(id, elem, idx) AS (
SELECT
id,
arr_column [0] AS elem,
0 AS idx
FROM
nest_column_table
WHERE
id = 1
UNION
ALL
SELECT
(
CASE
WHEN umi.idx + 2 >= get_array_length(ci.arr_column) THEN umi.id + 1
ELSE umi.id
END
) AS id,
arr_column [umi.idx + 1] AS elem,
(
CASE
WHEN umi.idx + 2 >= get_array_length(ci.arr_column) THEN -1
ELSE umi.idx + 1
END
) AS idx
FROM
nest_column_table ci
INNER JOIN unnested umi ON umi.id = ci.id
)
SELECT
*
FROM
unnested;
这是这个简单的解决方案
-- unnest with REDSHIFT
create table test (
event_name varchar,
json_arr varchar
);
insert into test values
('hello', '["a", "b", "c"]'),
('goodbye', '["d", "e", "f"]');
with parsed as (
select *, json_parse(json_arr) as parsed_account_ids from test
)
SELECT index, element, event_name FROM parsed AS b, b.parsed_account_ids AS element AT index;