c_tbl 表具有格式为 -map(varchar, array(int)) 的特征列 映射表有两列 id, s_id
特征中的行示例 -
row1
{
"0": [90],
"1":[80, "60", -87],
"2":[95, "67", 85]
}
row2
{
"0": [99],
"1":[82, "62", -107],
"2":[195, 167, -185]
}
映射表行示例
67, 1111
167, 2222
生成一个 presto SELECT 语句,生成两列 - features 和 features_updated
更新后将具有以下逻辑, 如果键 = 2 并且 如果特征数组中的值与映射表的 id 列匹配,则将其替换为 s_id,否则保持原样
例如 features_updated 将有以下输出 -
row1
{
"0": [90],
"1":[80, "60", -87],
"2":[95, "1111", 85]
}
row2
{
"0": [99],
"1":[82, "62", -107],
"2":[195, 2222, -185]
}
请注意 chtgpt 没有给我正确的 SQL 查询:P
我尝试过这种方法 -
SELECT
features,
map_agg(
key,
CASE
WHEN key = '2' AND m.s_id IS NOT NULL THEN CAST(m.s_id AS varchar)
ELSE CAST(value AS varchar)
END
) AS features_updated
FROM c_tbl
CROSS JOIN UNNEST(features) AS t(key, values)
CROSS JOIN UNNEST(values) AS val(value)
LEFT JOIN mapping m ON key = '2'
AND try(CAST(val.value AS integer)) = m.id
GROUP BY features
使用转换函数迭代映射中每个键的数组。对于每个值v,它在映射表中查找s_id,其中id等于v。如果找到匹配,则将v替换为s_id;否则,当子查询返回 NULL 时,v 由 COALESCE 返回。
SELECT
features
, map_agg(KEY, CASE
WHEN KEY = '2'
THEN transform(VALUES, v - > COALESCE (
(
SELECT s_id
FROM mapping
WHERE id = v
)
, v
))
ELSE VALUES
END) AS features_updated
FROM c_tbl
CROSS JOIN UNNEST(features) AS t(KEY, VALUES)
GROUP BY features