我有一个表,我的每日脚本将结果发送到该表。它看起来像这样:
产品 | 发现问题 | 运行_日期 |
---|---|---|
阿尔法 | 2 | 2024-8-12 |
阿尔法 | 5 | 2024-8-11 |
阿尔法 | 3 | 2024-8-10 |
阿尔法 | 0 | 2024-8-9 |
阿尔法 | 1 | 2024-8-8 |
阿尔法 | 5 | 2024-8-7 |
测试版 | 4 | 2024-8-12 |
测试版 | 3 | 2024-8-11 |
我想找出每个产品当前问题集开始的日期(即
issues_found
从 0 变为非零的最晚日期),并将其添加到另一列。所以表格应该变成这样:
产品 | 发现问题 | 运行_日期 | 问题_开始 |
---|---|---|---|
阿尔法 | 2 | 2024-8-12 | 2024-8-10 |
阿尔法 | 5 | 2024-8-11 | 2024-8-10 |
阿尔法 | 3 | 2024-8-10 | 2024-8-10 |
阿尔法 | 0 | 2024-8-9 | 2024-8-10 |
阿尔法 | 1 | 2024-8-8 | 2024-8-10 |
阿尔法 | 5 | 2024-8-7 | 2024-8-10 |
测试版 | 4 | 2024-8-12 | 2024-8-11 |
测试版 | 3 | 2024-8-11 | 2024-8-11 |
我的方法是迭代每个产品的按日期排序的记录,并在遇到 0 时中断:
DECLARE found_date DATE;
FOR row in (SELECT * FROM my_table)
DO
SET found_date = NULL;
FOR historical_entry in (SELECT * FROM my_table WHERE product = row.product ORDER BY run_date DESC)
DO
IF historical_entry.issues_found <> 0 THEN
SET found_date = historical_entry.run_date;
ELSE
BREAK;
END IF;
END FOR;
UPDATE my_table SET issues_started = found_date where product = row.product;
END FOR;
这是一种非常程序化的方法,尽管它有效,但在针对数百种产品和数千条记录运行时需要花费数小时。有没有比两个嵌套循环更好的方法?我尝试在非零
MIN(run_date)
行的聚合上使用 issues_found
但无法完全正确。
考虑以下方法
WITH issue_starts AS (
SELECT
product,
run_date,
issues_found,
CASE
WHEN issues_found <> 0 AND IFNULL(LAG(issues_found) OVER (PARTITION BY product ORDER BY run_date), 0) = 0
THEN run_date
END AS start_of_issues
FROM your_table
), latest_issue_start AS (
SELECT
product,
run_date,
issues_found,
MAX(start_of_issues) OVER (PARTITION BY product ORDER BY run_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS issues_started
FROM issue_starts
)
SELECT
t.product,
t.issues_found,
t.run_date,
l.issues_started
FROM your_table t
JOIN latest_issue_start l
ON t.product = l.product AND t.run_date = l.run_date
ORDER BY t.product, t.run_date DESC
如果应用于您问题中的样本数据 - 输出为