我有这张关于医疗患者的表格:
CREATE TABLE SAMPLE_DATA (
patient_name VARCHAR(50),
year INTEGER,
gender CHAR(1),
patient_weight DECIMAL(5,2),
location VARCHAR(20)
);
INSERT INTO SAMPLE_DATA (patient_name, year, gender, patient_weight, location) VALUES
('Sarah', 2010, 'F', 65.00, 'hospital'),
('Sarah', 2012, 'F', 66.00, 'home'),
('Sarah', 2013, 'F', 67.00, 'hospital'),
('Michael', 2011, 'M', 78.00, 'hospital'),
('Michael', 2013, 'M', 76.00, 'home'),
('Michael', 2015, 'M', 77.00, 'hospital'),
('James', 2010, 'M', 82.00, 'home'),
('James', 2014, 'M', 80.00, 'hospital'),
('Emma', 2012, 'F', 70.00, 'hospital'),
('Emma', 2013, 'F', 71.00, 'home'),
('Emma', 2015, 'F', 71.00, 'hospital'),
('Robert', 2011, 'M', 88.00, 'hospital'),
('Robert', 2014, 'M', 85.00, 'home'),
('Maria', 2010, 'F', 63.00, 'hospital'),
('Maria', 2012, 'F', 64.00, 'home'),
('Maria', 2015, 'F', 64.00, 'hospital');
原来是这样的:
patient_name | year | gender | patient_weight | location
-------------|------|--------|----------------|----------
Sarah | 2010 | F | 65.00 | hospital
Sarah | 2012 | F | 66.00 | home
Sarah | 2013 | F | 67.00 | hospital
Michael | 2011 | M | 78.00 | hospital
Michael | 2013 | M | 76.00 | home
Michael | 2015 | M | 77.00 | hospital
James | 2010 | M | 82.00 | home
James | 2014 | M | 80.00 | hospital
Emma | 2012 | F | 70.00 | hospital
Emma | 2013 | F | 71.00 | home
Emma | 2015 | F | 71.00 | hospital
Robert | 2011 | M | 88.00 | hospital
Robert | 2014 | M | 85.00 | home
Maria | 2010 | F | 63.00 | hospital
Maria | 2012 | F | 64.00 | home
Maria | 2015 | F | 64.00 | hospital
期望的结果:我想转换此表,使其显示每对测量之间患者发生的情况:
patient_name | start_year | gender | start_weight | years_until_next | location_change
-------------|------------|---------|--------------|------------------|-------------------
Sarah | 2010 | F | 65.00 | 2 | hospital-home
Sarah | 2012 | F | 66.00 | 1 | home-hospital
Michael | 2011 | M | 78.00 | 2 | hospital-home
Michael | 2013 | M | 76.00 | 2 | home-hospital
James | 2010 | M | 82.00 | 4 | home-hospital
Emma | 2012 | F | 70.00 | 1 | hospital-home
Emma | 2013 | F | 71.00 | 2 | home-hospital
Robert | 2011 | M | 88.00 | 3 | hospital-home
Maria | 2010 | F | 63.00 | 2 | hospital-home
Maria | 2012 | F | 64.00 | 3 | home-hospital
我是 SQL 中 LAG 和 LEAD 函数的新手,我尝试执行以下操作:
WITH next_measurements AS (
SELECT
patient_name,
year as start_year,
gender,
patient_weight as start_weight,
location as start_location,
LEAD(year) OVER (
PARTITION BY patient_name
ORDER BY year
) as next_year,
LEAD(location) OVER (
PARTITION BY patient_name
ORDER BY year
) as next_location
FROM sample_data
)
SELECT
patient_name,
start_year,
gender,
start_weight,
(next_year - start_year) as years_until_next,
LOWER(start_location) || '-' || LOWER(next_location) as location_change
FROM next_measurements
WHERE next_year IS NOT NULL
ORDER BY patient_name, start_year;
代码似乎可以工作:
patient_name start_year gender start_weight years_until_next location_change
Emma 2012 F 70 1 hospital-home
Emma 2013 F 71 2 home-hospital
James 2010 M 82 4 home-hospital
Maria 2010 F 63 2 hospital-home
Maria 2012 F 64 3 home-hospital
Michael 2011 M 78 2 hospital-home
Michael 2013 M 76 2 home-hospital
Robert 2011 M 88 3 hospital-home
Sarah 2010 F 65 2 hospital-home
Sarah 2012 F 66 1 home-hospital
这是使用这些功能的正确方法吗?
Lag: 用于访问前一行的数据或信息。
Lead: 用于访问后续/后续行中的数据或信息。
Lag 和Lead 都可以帮助您通过访问多行数据来执行比较,而无需使用自连接,并且可用于将当前数据与上一行或下一行进行比较。
您的查询正在使用基于年份和位置的 Lead 从下一行正确检索数据,并查找时间间隙和位置变化。
但是我没有看到Lag在您的查询中的任何地方使用,并且我认为您想要的结果不需要它,但是如果您也使用两者来获得先前的更改,那么它看起来会是这样的:
WITH next_measurements AS (
SELECT
patient_name,
year as start_year,
gender,
patient_weight as start_weight,
location as start_location,
LEAD(year) OVER (
PARTITION BY patient_name
ORDER BY year
) as next_year,
LEAD(location) OVER (
PARTITION BY patient_name
ORDER BY year
) as next_location ,
LAG(year) OVER (
PARTITION BY patient_name
ORDER BY year
) as prior_year,
LAG(location) OVER (
PARTITION BY patient_name
ORDER BY year
) as prior_location
FROM sample_data
)
SELECT
patient_name,
start_year,
gender,
start_weight,
(next_year - start_year) as years_until_next,
LOWER(start_location) || '-' || LOWER(next_location) as location_change ,
(start_year - prior_year) as years_until_prior,
LOWER(prior_location) || '-' || LOWER(start_location) as prior_location_change
FROM next_measurements
WHERE next_year IS NOT NULL
ORDER BY patient_name, start_year;
示例: https://www.ibm.com/docs/en/psfa/7.1.0?topic=functions-lag-lead-family-syntax