合并到更新/插入

Question

我在合并到 databricks 笔记本中的大量数据集时遇到问题。如何将

merge into

脚本转换为

update/insert

？

target_table = f"""
  MERGE INTO {target_table_name} target
  USING {staging_table_name} source
  ON
      source.ResponseRgBasketId = target.ResponseRgBasketId
  AND source.RequestTimestamp   = target.RequestTimestamp
  WHEN
      MATCHED
  THEN UPDATE SET
      *
  WHEN NOT MATCHED
  THEN INSERT
      *

Answer 1

对于事务表从另一个表进行更新仅支持通过

MERGE

。请参阅：https://community.cloudera.com/t5/Support-Questions/update-one-hive-table-based-on-another-table/td-p/160017

因此，您可以在 ACID 模式下使用相同的

MERGE

，但仅在更新时不使用

WHEN NOT MATCHED

。并且仅使用不存在或左连接进行 INSERT，如下面针对非 ACID 的 INSERT ONLY 示例（同样适用于酸性）。

使用非 ACID 表，您可以单独执行这些操作，但也可以使用 INSERT OVERWRITE + LEFT JOIN 来代替 UPDATE。

您可以创建非 ACID 表并使用左联接来覆盖它（整个表），但是在这种情况下，分离插入和更新不会给您带来任何用处，因为您将需要联接来执行这两个操作。

仅更新：

create table new_target_table --if table exists, use INSERT OVERWRITE table new_target_table 
as 

select col1, ... colN,
       coalesce (s.col, t.col) as col
       ...
 from target_table_name t 
      left join source s
      on s.ResponseRgBasketId = t.ResponseRgBasketId
         and s.RequestTimestamp   = t.RequestTimestamp

仅插入：

insert into new_target_table 
select from source s 
       left join target t 
       on s.ResponseRgBasketId = t.ResponseRgBasketId
         and s.RequestTimestamp   = t.RequestTimestamp
where t.ResponseRgBasketId is null --If target does not exists

INSERT 和 UPDATE 一起（与 ACID 模式下的 MERGE 相同）：

insert overwrite table new_target_table 

select case when s.ResponseRgBasketId is null then t.col else s.col end   as col,
       case when s.ResponseRgBasketId is null then t.col2 else s.col2 end as col2
       ...
from source s 
           FULL JOIN target t 
           on s.ResponseRgBasketId = t.ResponseRgBasketId
             and s.RequestTimestamp   = t.RequestTimestamp

合并到更新/插入

问题描述投票：0回答：1

1个回答

最新问题

合并到更新/插入

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1