有一个数据集。我想首先从不在(其他,其他)的班级过渡到(其他,其他)班级
| row_id | class |
| ------ | ------- |
| 1 | Math |
| 2 | Math |
| 3 | Math |
| 4 | Math |
| 5 | Math |
| 6 | Math |
| 7 | Other |
| 8 | Other |
| 9 | Other |
| 10 | Biology |
| 11 | Biology |
| 12 | Other |
| 13 | Other |
| 14 | Biology |
| 15 | Biology |
| 16 | Others |
| 17 | Others |
| 18 | Others |
| 19 | Physics |
| 20 | Others |
所以结果将是:
| row_id | class | prev_row_id | prev_class |
| ------ | ------- | ----------- | ---------- |
| 6 | Math | 7 | Other |
| 11 | Biology | 12 | Other |
| 15 | Biology | 16 | Other |
| 19 | Physics | 20 | Others |
我找到了如何检测 LAST 转换,但在历史记录中找不到。我用 PTRESTO。
我们可以在这里使用
LEAD()
解析函数:
WITH cte AS (
SELECT *, LEAD(class) OVER (ORDER BY row_id) AS lead_class,
LEAD(row_id) OVER (ORDER BY row_id) AS lead_row_id
FROM yourTable
)
SELECT
row_id,
class,
lead_row_id AS next_row_id,
lead_class AS next_class
FROM cte
WHERE class NOT IN ('Other', 'Others') AND
lead_class IN ('Other', 'Others')
ORDER BY row_id;
您可以自行连接表,将初始行限制为非其他行,并连接到其他行,并且它们是顺序的:
select t.row_id, t.class, th.row_id prev_row_id , th.class prev_class
from my_table t
join my_table th
on (th.row_id = t.row_id + 1 and th.class in ('Other', 'Others'))
where t.class not in ('Other', 'Others')
*注意: 这仅在 id 连续时有效
您可以使用窗口函数。例如
lag
:
-- sample data
with dataset(row_id, class) as(
values (1, 'Math'),
(2, 'Math'),
(6, 'Math'),
(7, 'Other'),
(9, 'Other'),
(10, 'Biology'),
(11, 'Biology'),
(12, 'Other'),
(13, 'Other'),
(14, 'Biology'),
(15, 'Biology'),
(16, 'Others'),
(17, 'Others'),
(18, 'Others'),
(19, 'Physics'),
(20, 'Others')
),
-- query parts
with_prev as(
SELECT *,
lag(row_id) OVER w AS prev_row_id,
lag(class) OVER w AS prev_class
FROM dataset
WINDOW w AS (ORDER BY row_id) -- Trino allows sharing window
)
select *
from with_prev
where prev_class not like 'Other%' and class like 'Other%'; -- or in ('Other', 'Others')
输出:
行号 | 班级 | prev_row_id | prev_class |
---|---|---|---|
7 | 其他 | 6 | 数学 |
12 | 其他 | 11 | 生物学 |
16 | 其他 | 15 | 生物学 |
20 | 其他 | 19 | 物理 |