Redshift中的LAG窗口功能 - 以单行显示上一年和当前年份值

问题描述 投票:2回答:1

我需要在一行中显示上一年和当前年份的值,以获得一组列组合。场景如下:我有一个这样的数据集:

Student City    Country Year Month Subject Marks
John    Boston  USA    2018  01    Maths   90
Mark    London  UK     2018  01    Maths   95
John    Boston  USA    2019  01    Maths   95
Mark    London  UK     2019  01    Maths   83
John    Boston  USA    2018  01    Arts    90
Mark    London  UK     2018  01    Arts    95
John    Boston  USA    2019  01    Arts    95
Mark    London  UK     2019  01    Arts    83

我希望输出为:

Student  City  Country  Year  Month  Maths_curr  Maths_prev  Arts_curr Arts_prev  
John     Boston USA     2019  01     95          90          95        90
John     Boston USA     2018  01     90          null        90        null
Mark     London UK      2019  01     83          95          83        95
Mark     London UK      2018  01     95          null        95        null 

我想,我需要使用LAG函数来实现这个...我使用了这段代码

select student,city,country,year,month,subject,marks as curr,
lag(marks,1)over(partition by student,city,country,subject order by year,month) as prev
from <table>
order by student,city,country,year,month

我得到的输出是:

Student City    Countr  Year Month Subject  Curr  Prev
John    Boston  USA    2019  01    Maths    95    90
John    Boston  USA    2018  01    Maths    90    null
John    Boston  USA    2019  01    Arts     95    90
John    Boston  USA    2018  01    Arts     90    null
Mark    London  UK     2019  01    Maths    83    95
Mark    London  UK     2018  01    Maths    95    null
Mark    London  UK     2019  01    Arts     83    95
Mark    London  UK     2018  01    Arts     95    null

你能帮助我获得所需的输出...... LEAD还是LAG,在这种情况下使用的正确函数是什么?有没有其他方法可以在Redshift中实现这一目标?

任何帮助是极大的赞赏。

我也试过这个代码..

select student,city,country,year,month,subject,
case when substring(curr,1,1) = 'M' then cast(split_part(curr,' ',2) as integer) end as maths_curr,
case when substring(prev,1,1) = 'M' then cast(split_part(prev,' ',2) as integer) end as maths_prev,
case when substring(curr,1,1) = 'A' then cast(split_part(curr,' ',2) as integer) end as arts_curr,
case when substring(prev,1,1) = 'A' then cast(split_part(prev,' ',2) as integer) end as arts_prev
from
(select student,city,country,year,month,subject,
case when subject = 'MATHS' then 'M ' + cast(nvl(marks,0) as varchar)
     else 'A ' + cast(nvl(marks,0) as varchar)
     end as curr,
case when subject = 'MATHS' then 'M ' + cast(nvl(lag(marks,1)over (partition by student,city,country,subject order by year,mth),0) as varchar)
     else 'A ' + cast(nvl(lag(marks,1)over (partition by student,city,country,subject order by year,mth),0) as varchar)
     end as prev
from <table>
order by student,city,country,year,month)

在这里我得到的输出为:

Student City    Country Year Month Subject  Maths_Curr  Maths_Prev   Arts_Curr   Arts_Prev
John    Boston  USA    2019  01    Maths    95          90           null        null
John    Boston  USA    2018  01    Maths    90          null         null        null
John    Boston  USA    2019  01    Arts     null        null         95          90
John    Boston  USA    2018  01    Arts     null        null         90          null
Mark    London  UK     2019  01    Maths    83          95           null        null
Mark    London  UK     2018  01    Maths    95          null         null        null
Mark    London  UK     2019  01    Arts     null        null         83          95
Mark    London  UK     2018  01    Arts     null        null         95          null

不确定我到底哪里错了..在这里需要一些指导......

amazon-redshift analytics
1个回答
1
投票

这应该做的伎俩:

WITH base AS (
  SELECT *,
         CASE WHEN "Subject" = 'Maths' THEN "Marks" ELSE NULL END AS maths_current,
         CASE WHEN "Subject" = 'Arts' THEN "Marks" ELSE NULL END AS arts_current,
         CASE WHEN "Subject" = 'Maths' THEN LAG("Marks") OVER (PARTITION BY "Student","City","Country","Subject" ORDER BY "Year","Month") ELSE NULL END AS previous_math,
         CASE WHEN "Subject" = 'Arts' THEN LAG("Marks") OVER (PARTITION BY "Student","City","Country","Subject" ORDER BY "Year","Month") ELSE NULL END AS previous_arts
  FROM <table>
)

SELECT "Student",
       "City",
       "Country",
       "Year",
       "Month",
       MAX(maths_current) AS Maths_curr,
       MAX(previous_math) AS Maths_prev,
       MAX(arts_current) AS Arts_curr,
       MAX(previous_arts) AS Arts_prev
FROM base
GROUP BY 1,2,3,4,5
ORDER BY 1,2,3,4 DESC,5 DESC
最新问题
© www.soinside.com 2019 - 2025. All rights reserved.