场景如下:
我在 Snowflake 中有两个模式的数据,旨在验证两个表是否包含相同的数据值。我在 Snowflake 中设置了动态 SQL 方法来完成此任务。
我的代码首先使用 CTE 从两个表中检索列,然后应用 MD5 哈希函数生成哈希值进行比较。最后,我构建了一个动态 SQL 查询来显示比较结果。
但是,动态 SQL 只显示 SELECT 语句本身,而不是返回实际输出,因此我不确定我的方法是否完全正确。
Here is my code snippet:
-- Step 1: Extract the column list for both tables (schema_1 & schema_2)
WITH column_list_schema_1 AS (
SELECT COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = 'schema_1'
AND TABLE_NAME = 'table_1'
),
column_list_schema_2 AS (
SELECT COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = 'schema_2'
AND TABLE_NAME = 'table_2'
),
-- Step 2: Generate dynamic SQL for comparing values in each row and generating hash values for comparison
dynamic_sql AS (
SELECT
LISTAGG(
'CASE WHEN h."' || COLUMN_NAME || '" != s."' || COLUMN_NAME || '" THEN ''' || COLUMN_NAME || ''' END AS "' || COLUMN_NAME || '"',
', '
) WITHIN GROUP (ORDER BY COLUMN_NAME) AS column_comparisons,
-- Generate MD5 hashes for comparison (on a row level)
LISTAGG(
'MD5(CAST(h."' || COLUMN_NAME || '" AS STRING))', ', '
) WITHIN GROUP (ORDER BY COLUMN_NAME) AS schema_1_hash,
LISTAGG(
'MD5(CAST(s."' || COLUMN_NAME || '" AS STRING))', ', '
) WITHIN GROUP (ORDER BY COLUMN_NAME) AS schema_2_hash
FROM column_list_heroku
WHERE COLUMN_NAME IN (SELECT COLUMN_NAME FROM column_list_schema_2)
)
-- Step 3: Generate the final comparison query string
SELECT
'SELECT h.row_id AS h_row_id,
s.row_id AS s_row_id,
h.parent_id AS parent_id,
MD5(CONCAT(' || ds.h_hash || ')) AS h_hash,
MD5(CONCAT(' || ds.s_hash || ')) AS s_hash,
CASE WHEN MD5(CONCAT(' || ds.h_hash || ')) = MD5(CONCAT(' || ds.s_hash || '))
THEN ''Match'' ELSE ''Mismatch'' END AS comparison_result,
' || ds.column_comparisons || '
FROM
(SELECT *, ROW_NUMBER() OVER () AS row_id, parent_id FROM schema_1.your_table_name) h
FULL OUTER JOIN
(SELECT *, ROW_NUMBER() OVER () AS row_id, parent_id FROM schema_2.your_table_name) s
ON h.row_id = s.row_id
WHERE MD5(CONCAT(' || ds.h_hash || ')) != MD5(CONCAT(' || ds.s_hash || '));'
FROM dynamic_sql ds;