我需要帮助来创建我的第一个增量模型。
在campaignchannel表中,我有一个surrogate_key列,它采用clientcode_id并连接索引号来创建它,它的作用就像一个唯一键。这个想法是,当创建一个新的活动时,我们将找到该客户的最大 surrogate_id 和 +1,这样我们就可以为客户的所有活动提供唯一的 id。这种情况下增量工作吗?
还有,增量模型有备份好吗?
这是python sql代码
-- dbt campaign channel level dimension : dim_campaignchannel
{{ config(materialized='incremental', unique_key=surrogate_key) }}
SELECT --ROW_NUMBER() OVER (ORDER BY c.clientcode, c.adservername) AS index_number,
CAST(CONCAT(c.clientcode_id, ROW_NUMBER() OVER (ORDER BY c.clientcode, c.adservername)) as numeric) AS surrogate_key, c.* from
(
SELECT b.clientcode_id, a.clientcode, a.adservername, a.mediachannel, a.adtech, a.programname, a.funnelstage, a.period, a.season, a.campaigntype
,a.lob, a.businessline, a.objective, a.market, a.targettype, a.subcampaign, a.campaignyear, a.startdate, a.enddate
FROM public.map_campaign_segments a
JOIN
public.map_client_segments b ON a.clientcode = b.clientcode
where a.adservername != '-'
order by a.clientcode, a.adservername) c
order by c.clientcode, c.adservername
这是我想出的,不确定这是否是正确的方法:
-- dbt campaign channel level dimension : dim_campaignchannel_master
{{ config(materialized='incremental',
unique_key='surrogate_key') }}
SELECT
CASE
WHEN EXISTS (
SELECT 1 FROM dim_campaignchannel WHERE adservername = c.adservername
) THEN
CAST(CONCAT(c.clientcode_id, ROW_NUMBER() OVER (ORDER BY c.clientcode, c.adservername)) as numeric) -- Generate a new surrogate key if adservername exists
ELSE
(SELECT MAX(surrogate_key) + 1 FROM dim_campaignchannel WHERE clientcode = c.clientcode) -- Increment the surrogate key for existing clientcode
END AS surrogate_key,
c.*
FROM (
SELECT
b.clientcode_id, a.clientcode, a.adservername, a.mediachannel, a.adtech, a.programname, a.funnelstage, a.period, a.season, a.campaigntype,
a.lob, a.businessline, a.objective, a.market, a.targettype, a.subcampaign, a.campaignyear, a.startdate, a.enddate
FROM
public.map_campaign_segments a
JOIN
public.map_client_segments b ON a.clientcode = b.clientcode
WHERE
a.adservername != '-'
ORDER BY
a.clientcode, a.adservername
) c
{% if is_incremental() %}
-- this filter will only be applied on an incremental run
-- (uses >= to include records arriving later on the same day as the last run of this model)
WHERE
NOT EXISTS (
SELECT 1 FROM dim_campaignchannel WHERE adservername = c.adservername
)
{% endif %}
ORDER BY
c.clientcode, c.adservername
对于增量物化的用途可能存在误解。
增量模型:
✅ 用于优化 dbt 工作负载(如果
dbt run
执行时间太长)此时您很可能不需要增量模型。
dbt 最佳实践指南 说:
从尽可能简单的开始
因此,当 dbt 运行变得太长太重时,现在是开始考虑转向增量的好时机。同时,它与生成代理/主/增量键无关。
几点注意事项:
ROW_NUMBER() OVER (ORDER BY c.clientcode, c.adservername))
可能容易出错。因为在不同时间,同一客户端活动的 surrogate_key 可以具有不同的值。如果可能的话,考虑添加时间戳 ROW_NUMBER() OVER (ORDER BY c.clientcode, c.adservername, c.campaign_created_timestamp))