基于日期的SQL数据聚合

问题描述 投票:0回答:4

我确信这是一个非常愚蠢的问题,我有一个愚蠢的时刻。考虑以下基本场景(与具有许多维度和度量的现实相比,这是一个非常小的场景):

Data

我需要得到的是预期的输出。因此,包括params中定义的input_Date和output_date之间的所有成本。但是只包含最新的PID - 定义为:

1-其中PID按顺序运行,或者基于date_to重叠最新的PID,只要两个在@输出日期2都没有激活,在@输出日期有两个PID激活显示两者

我不能为我的生活解决如何在SQL中执行此操作,请注意,必须是非动态的,不幸的是不使用任何CTE,只是基本的SQL与子查询

显然返回必要的ID和PID列表很容易:

declare @input_date date ='2006-01-01'
declare @output_date date ='2006-12-31'

select a.PID, a.ID
from #tmp a
where date_from <=@output_date and date_to >=@input_date

但我无法弄清楚如何加入这个以返回正确的成本值

drop table tmp
CREATE TABLE [dbo].[tmp](
       [date_from] [datetime] NOT NULL,
       [date_to] [datetime] NOT NULL,
       [ID] [nvarchar](25) NOT NULL,
       [PID] [nvarchar](25) NOT NULL,
       [cost] [float] NULL
) ON [PRIMARY]
INSERT tmp VALUES('2005-1-1','2005-1-31','10001','X123',1254.32)
INSERT tmp VALUES('2000-10-10','2006-8-21','10005','TEST01',21350.9636378758)
INSERT tmp VALUES('2006-8-22','2099-12-31','10005','TEST02',22593.4926163943)
INSERT tmp VALUES('2006-1-1','2099-12-31','10006','X01',22458.3342354444)
INSERT tmp VALUES('2006-2-8','2099-12-31','10006','X02',22480.3772331959)
INSERT tmp VALUES('2006-1-1','2006-2-7','10007','AB01',565.416874152212)
INSERT tmp VALUES('2006-2-8','2006-7-31','10007','AA05',19108.3206482165)

我已经使用CTE取得了一些进展,所以你可以看看我如何做到这一点,如果我能:

drop table #tmp 


CREATE TABLE #tmp (
       [date_from] [datetime] NOT NULL,
       [date_to] [datetime] NOT NULL,
       [ID] [nvarchar](25) NOT NULL,
       [PID] [nvarchar](25) NOT NULL,
       [cost] [float] NULL
) ON [PRIMARY]
INSERT #tmp  VALUES('2005-1-1','2005-1-31','10001','X123',1254.32)
INSERT #tmp  VALUES('2000-10-10','2006-8-21','10005','TEST01',21350.9636378758)
INSERT #tmp  VALUES('2006-8-22','2099-12-31','10005','TEST02',22593.4926163943)
INSERT #tmp  VALUES('2006-1-1','2099-12-31','10006','X01',22458.3342354444)
INSERT #tmp  VALUES('2006-2-8','2099-12-31','10006','X02',22480.3772331959)
INSERT #tmp  VALUES('2006-1-1','2006-2-7','10007','AB01',565.416874152212)
INSERT #tmp  VALUES('2006-2-8','2006-7-31','10007','AA05',19108.3206482165)

declare @input_date date ='2006-01-01'
declare @output_date date ='2006-12-31'


;with cte as (
select t.id,t.PID,t.cost,t.date_from,t.date_to , 
        iif(date_To >= @output_date  OR max_date_To is not null,PID,NULL) as PID2,
        b.total_id_cost 
    from #tmp  t
    left join (select ID,max(date_to) as max_date_to
                from #tmp
                where date_from <=@output_date and date_to >=@input_date
                group by ID) a
    on t.ID = a.ID and t.date_to = a.max_date_to
    left join (Select ID, sum(cost) as total_id_cost
                from  #tmp
                where date_from <=@output_date and date_to >=@input_date
                group by ID) b
    on t.ID = b.ID
    where date_from <=@output_date and date_to >=@input_date )


select distinct ID,PID2,
iif(ID in (
            select ID   
            from cte
            where PID2 IS NULL) 
and ID not in (select ID    
            from cte
            where PID IS NOT NULL
            group by ID
            having count (distinct PID2) >1  ), cte.total_id_cost, cost) as cost
from cte
where PID2 is not null;
sql sql-server
4个回答
1
投票

所以看起来在1个查询中要解决几个问题。

  1. 我们想要与最新日期匹配的PID。这并不太困难,可以通过将数据与找到最新日期的聚合数据相结合来解决
  2. 如果两个PID都处于活动状态,即从日期和日期重叠,则两者都必须显示。我发现这更棘手。最后,我做了一个查询,找到重叠并满足日期的那些,并对此进行了计数。然后使用此计数作为1的连接条件,以便它可以有条件地选择与最新日期匹配的PID

然后最后使用上面的结果,你可以做总和来获得成本。结果查询有点像怪物,但在这里。如果它没有涵盖其他不详细的情况,请告诉我。

DECLARE @Data TABLE (date_from DATETIME, date_to DATETIME, ID INT, PID NVARCHAR(50), COST MONEY)
INSERT @Data VALUES('2005-1-1','2005-1-31','10001','X123',1254.32)
INSERT @Data VALUES('2000-10-10','2006-8-21','10005','TEST01',21350.9636378758)
INSERT @Data VALUES('2006-8-22','2099-12-31','10005','TEST02',22593.4926163943)
INSERT @Data VALUES('2006-1-1','2099-12-31','10006','X01',22458.3342354444)
INSERT @Data VALUES('2006-2-8','2099-12-31','10006','X02',22480.3772331959)
INSERT @Data VALUES('2006-1-1','2006-2-7','10007','AB01',565.416874152212)
INSERT @Data VALUES('2006-2-8','2006-7-31','10007','AA05',19108.3206482165)

declare @input_date date ='2006-01-01'
declare @output_date date ='2006-12-31'


select
    a.ID,
    PIDForMaxDateThatMatches.PID,
    SUM(a.cost) as cost
from
    @Data a
    inner join (
        -- number of PIDs for dates that overlap grouped by ID
        select
            a.ID,
            -- where there's no overlap then we want the count to be 1 so that later we can use it as condition
            COUNT(DISTINCT ISNULL(b.PID,'')) as NumberOfPID
        from
            @Data a
            -- may or may not find overlaps
            LEFT JOIN @data b ON
                b.date_from <=@output_date and
                b.date_to >=@input_date and
                a.date_from <= b.date_to and
                a.date_to >= b.date_from and
                a.ID = b.ID and
                a.PID <> b.PID
        where
            a.date_from <=@output_date and
            a.date_to >=@input_date
        group by
            a.ID) as PIDCountForOverlappingMatches ON
        a.ID = PIDCountForOverlappingMatches.ID
    left join (
        -- get the PID that matches the max date_to 
        select
            DataForMaxDate.ID,
            DataForMaxDate.date_from,
            DataForMaxDate.date_to,
            DataForMaxDate.PID
        from
            @Data as DataForMaxDate
            inner join (
                -- get the max date_to that matches the criteria
                select
                    ID,
                    MAX(date_to) as maxDateTo
                from
                    @Data a
                where
                    date_from <=@output_date and
                    date_to >=@input_date
                group by
                    ID) as MaxToDatePerID on
            DataForMaxDate.ID = MaxToDatePerID.ID and
            DataForMaxDate.date_to = MaxToDatePerID.maxDateTo) as PIDForMaxDateThatMatches on
        a.ID = PIDForMaxDateThatMatches.ID AND
        -- if there's no overlapping dates the PID count would be 1, which we'll take the PID that matches the max(date_to)
        -- but if there is overlap, then we want both dates to show, thus the from date must also match before we take the PID
        (PIDCountForOverlappingMatches.NumberOfPID = 1 OR a.date_from = PIDForMaxDateThatMatches.date_from)

where
    a.date_from <= @output_date and
    a.date_to >= @input_date
GROUP BY
    a.ID,
    PIDForMaxDateThatMatches.PID
ORDER BY
    a.ID    

编辑:DB小提琴http://dbfiddle.uk/?rdbms=sqlserver_2014&fiddle=d43cb4b9765da1bca035531e78a2c77d

结果:ID PID成本10005 TEST02 43944.4562 10006 X01 22458.3342 10006 X02 22480.3772 10007 AA05 19673.7375


0
投票

您好,您可以尝试以下查询:

select a.resource_id ID, max(a.post_id) PID, SUM(a.cost) Cost from #tmp a where date_from <=@output_date and date_to >=@input_date group by a.resource_id order by a.resource_id;


0
投票

我认为这可能有效:

SELECT
    t1.ID, 
    q1.PID, 
    SUM(t1.cost)
FROM
 Table AS t1
JOIN
(
SELECT
    q2.ID,
    t2.PID
FROM
    (
    SELECT
        ID, 
        MAX(date_to) AS maxdate
    FROM
        Table
    GROUP BY
        ID
    ) AS q2
JOIN
    table AS t2
ON
    q2.ID = t2.ID
AND 
    q2.maxdate = t2.date_to
) AS q1
ON
    t1.ID = q1.ID
AND
    t1.PID = q1.PID
GROUP BY
    t1.ID, 
    q1.PID

0
投票

这是一个没有CTE的查询。查询的想法:

1)找到连续的日期,并在每个id内制作不同的组

2)查找每个组的最小和最大日期,成本总和

3)输入参数限制

declare @date_from date = '20060101'
declare @date_to date = '20061231'

declare @myTable table(
    date_from date
    , date_to date
    , id int
    , pid varchar(30)
    , cost decimal(10,2)
)
insert into @myTable values
    ('20050101', '20050201', 10001, 'x123', 1254.32)
    , ('20001010', '20060821', 10005, 'test01', 21350.96)
    , ('20060822', '20991231', 10005, 'test02', 22593.49)
    , ('20060101', '20991231', 10006, 'x01', 22548.33)
    , ('20060208', '20991231', 10006, 'x02', 22480.38)
    , ('20060101', '20060207', 10007, 'abo1', 565.42)
    , ('20060208', '20060731', 10007, 'abo2', 19108.32)

select
    date_from = min(date_from), date_to = max(date_to)
    , id, pid = max(case when date_to = max_date_to then pid end)
    , cost = sum(cost)
from (
    select
        a.date_from, a.date_to, a.id, a.pid, a.cost, a.rn, grp = sum(b.ss)
        , max_date_to = max(a.date_to) over (partition by a.id, sum(b.ss))
    from
        (
            select
                a.*, ss = case when datediff(dd, b.date_to, a.date_from) = 1 then 0 else 1 end
            from
                (
                    select
                        *, rn = row_number() over (partition by id order by date_from)
                    from
                        @myTable
                ) a
                left join (
                    select
                        *, rn = row_number() over (partition by id order by date_from)
                    from
                        @myTable
                ) b on a.id = b.id and a.rn - 1 = b.rn
        ) a
        left join (
            select
                a.*, ss = case when datediff(dd, b.date_to, a.date_from) = 1 then 0 else 1 end
            from
                (
                    select
                        *, rn = row_number() over (partition by id order by date_from)
                    from
                        @myTable
                ) a
                left join (
                    select
                        *, rn = row_number() over (partition by id order by date_from)
                    from
                        @myTable
                ) b on a.id = b.id and a.rn - 1 = b.rn
        ) b on a.id = b.id and a.rn >= b.rn
    group by a.date_from, a.date_to, a.id, a.pid, a.cost, a.rn
) t
group by id, grp, max_date_to
having min(date_from) <= @date_from and max(date_to) >= @date_to
order by id

产量

date_from   date_to     id      pid     cost
------------------------------------------------
2000-10-10  2099-12-31  10005   test02  43944.45
2006-01-01  2099-12-31  10006   x01     22548.33

结果与您提供的输出略有不同。但:

1)对于id = 10006pid = X02 date_from = 08/02/2006,输入是01/01/2006

2)对于id = 10007 date_to = 31/07/2006,输入是31/12/2006

所以,我认为查询工作正常

Rextester demo以更易读的格式与cte

© www.soinside.com 2019 - 2024. All rights reserved.