我们如何将Redshift表从列转换为行?
例如,如果我们有一个通用(未知)表,如下所示:
source table:
date id alfa beta gamma ... omega
2018-08-03 1 1 2 3 4
2018-08-03 2 4 3 2 1
...
2018-09-04 1 3 1 2 4
...
我们如何才能取得以下成果?
transposed table:
date id column_name column_value
2018-08-03 1 alfa 1
2018-08-03 1 beta 2
...
2018-08-03 2 omega 1
...
2018-09-04 1 gamma 2
...
目标表,列数(alfa,beta,gamma,...,omega)都是动态的(所以我们正在寻找一个解决方案,不需要每列的case
when
映射,因为我们喜欢将其应用于几个不同的表格)。
但是我们将在所有目标表中具有date和date和id字段(或者最后是所有表中的主键或候选键)。
我们的Redshift版本是:
PostgreSQL 8.0.2, Redshift 1.0.3380
我们怎么做?
您需要将列名硬编码到查询中。
CREATE TABLE stack(date TEXT, id BIGINT, alpha INT, beta INT, gamma INT, omega INT);
INSERT INTO STACK VALUES('2018-08-03', 1, 1, 2, 3, 4);
INSERT INTO STACK VALUES('2018-08-03', 2, 4, 3, 2, 1);
INSERT INTO STACK VALUES('2018-08-04', 1, 3, 1, 2, 4);
SELECT
date,
id,
col,
col_value
FROM
(
SELECT date, id, alpha AS col_value, 'alpha' AS col FROM stack
UNION
SELECT date, id, beta AS col_value, 'beta' AS col FROM stack
UNION
SELECT date, id, gamma AS col_value, 'gamma' AS col FROM stack
UNION
SELECT date, id, omega AS col_value, 'omega' AS col FROM stack
) AS data
ORDER BY date, id, col
结果是:
2018-08-03 1 alpha 1
2018-08-03 1 beta 2
2018-08-03 1 gamma 3
2018-08-03 1 omega 4
2018-08-03 2 alpha 4
2018-08-03 2 beta 3
2018-08-03 2 gamma 2
2018-08-03 2 omega 1
2018-08-04 1 alpha 3
2018-08-04 1 beta 1
2018-08-04 1 gamma 2
2018-08-04 1 omega 4
代替不在评论中提供答案,这里是半伪代码来解释我是如何做到的,如果您需要更多信息/说明,请告诉我
# dictionary to define your target structure
target_d = {'date':'','id':'','column_name':'','column_value':''}
# dictionary for source structure
source_d = {'date':'date','id':'id','column_name1':'','column_name2':''....}
使用上面的这个dict你声明一个字段是否被映射它将不是动态的,所有其他字段/列将被旋转,你可以使用源表DDL将其增强为动态
# assuming you already read your source data
# your while loop to go thru the coming data
while <your code here>
# create a dict to process an incoming row
curr_d = target_d.copy()
curr_d['date'] = date from incoming record
curr_d['id'] = id from incoming record
# since we are going to create a row for each column name/value combos
# we need a new dict to hold the values
out_d = curr_d
上面这一行有两个目的,为输出行创建一个新的dict并保留输出行的持久部分(即date和id)
# rest of the fields are going to be pivoted now
for afield in source_d:
if afield not in source_d.values():
curr_d['column_name'] = afield
curr_d['column_value'] = column value from incoming record
create a 'row' from your out_d dict
write to output/ append to output data frame (if you use a data frame)
虽然循环将通过源行,for循环将为目标的每个列名/值组合创建一个新行
如果这对您有用,请告诉我。