UPDATE
target_table AS tgt
SET
campaign_name = src.campaign_name,
line_of_business = src.Campaign_Line_Of_Business
FROM
(
SELECT
A.project_id ,
A.campaign_id ,
A.run_id ,
B.Campaign_Line_Of_Business,
B.campaign_name
FROM
target_table A
left JOIN src_table B
ON
(A.Project_id = B.Project_id
AND A.Campaign_id = B.Campaign_id
AND A.Run_id = B.Run_id)
GROUP BY
1,
2,
3,
4,
5
) src
WHERE
(tgt.project_id = src.project_id
and tgt.campaign_id = src.campaign_id
and tgt.run_id = src.run_id);
投影分割和排序...
ORDER BY target_table.project_id,
target_table.campaign_id,
target_table.run_id,
target_table.campaign_node_id,
target_table.locale_cd
SEGMENTED BY hash(target_table.project_id,
target_table.campaign_id,
target_table.run_id,
target_table.campaign_node_id,
target_table.message_id,
target_table.locale_cd) ALL NODES KSAFE 1;
我不完全确定 - 当您在 UPDATE 语句的 FROM 子句中使用由 join 和 group by 组成的查询时,这是一种特殊情况。
但是:
驱动输入表 - 也是目标表,由这六个的哈希分段:
SEGMENTED BY hash(
target_table.project_id,
target_table.campaign_id,
target_table.run_id,
target_table.campaign_node_id,
target_table.message_id,
target_table.locale_cd
)
...连接 WHERE 条件由以下 3 个组成:
tgt.project_id = src.project_id
and tgt.campaign_id = src.campaign_id
and tgt.run_id = src.run_id
如果我们按列
a,b,c,d,e,f
进行分段,并且这些都是整数,则组合:
a | b | c | d | e | f
1 | 1 | 1 | 1 | 1 | 1
可以在节点 1 上,并且组合:
a | b | c | d | e | f
1 | 1 | 1 | 1 | 2 | 1
,因为它会导致不同的哈希值,因此可能位于节点 2 上。 因此用于连接的
a,b,c
组合将导致所有节点上都存在相同的组合,因此重新分段对于使连接成为可能是必不可少的。