优化SQL查询——视图的连接速度极慢

问题描述 投票:0回答:1

以下问题与

Microsoft SQL Azure (RTM) - 12.0.2000.8
有关。

我有一个发票数据集,如下所示 (

raw_data.invoices
):

发票_id 发票日期 机构 账单收件人 项目 数量
12345 2024-07-12 1111 约翰·史密斯 电话 20
12345 2024-07-12 1111 约翰·史密斯 按键 5
12345 2024-07-12 1111 简·史密斯 按键 2
12346 2024-07-05 1111 约翰·史密斯 电话 20
12346 2024-07-05 1111 简·史密斯 按键 2

我有一个看法,根据一些业务需求整理一下上表(

myview.invoices
):

select
    D.invoice_date,
    D.invoice_id,
    D.institution,
    C.institution_name,
    lower(trim(substring(C.institution_name, 1, charindex('-', C.institution_name)-1))) as institution_name,
    D.billed_to,
    D.item,
    D.qty
from raw_data.invoices
left join catalogues.institutions C
on
    C.institution_code = D.institution

以及识别每个机构的两个最新发票日期的视图 (

myview.last_2_inv_cycles
):

select 
    A.institution
    , A.current_inv_cycle
    , B.last_inv_cycle
from (
    select
        x.institution
        , max(x.invoice_date) as current_inv_cycle
    from myview.invoices x
    group by
        x.institution
) A
inner join (
    select
        z.institution
        , max(z.invoice_date) as last_inv_cycle
    from (
        select
            x.institution
            , x.invoice_date
        from myview.invoices x
        where concat(x.institution, x.invoice_date) not in (
            select concat(y.institution, max(y.invoice_date))
            from myview.invoices y
            group by y.institution 
        )
    ) z
    group by
        z.institution
) B
on A.institution=B.institution

最终,我将这两个视图连接在一起,以识别最新两张发票(每周收到)上带有

qty > 15
的任何发票行 (
myview.qty_over_15
):

with over_2_weeks as (
    select
        x.institution,
        x.billed_to,
        x.item
    from myview.invoices x
    inner join myview.last_2_inv_cycles y
    on x.institution = y.institution
        and (x.invoice_date = y.last_inv_cycle or x.invoice_date = y.current_inv_cycle)
    group by
        x.institution,
        x.billed_to,
        x.item
    having
        sum(case when x.qty > 15 then 1 else 0 end) >= 2

        -- exceptions defined by the business
        and x.institution <> '2222'
)

select
    A.invoice_date,
    A.invoice_id,
    A.institution,
    A.billed_to,
    A.item,
    A.qty
from myview.invoices A
inner join over_2_weeks D

-- problematic join; takes over an hour
on
    A.institution=D.institution
    AND A.billed_to=D.billed_to
    AND A.item=D.item

inner join myview.last_2_inv_cycles C
on
    A.institution=C.institution
    and A.invoice_date=C.current_inv_cycle

-- more exception list
where
    A.billed_to not in (
        'Jake Johnson', 'Bill Gates'
    )

正如您所知,查询太复杂并且需要很长时间(即使运行了 4 个多小时,最终视图也无法加载)。

myview.qty_over_15
myview.invoices
具有相同的美观要求,这就是为什么我在视图而不是
raw_data.invoices
表上运行查询;我想保持这种方式,除非有更好的方法来实现相同的目标。

至于

raw_data.invoices
上的索引:

create index idx_search_invoice_id
on raw_data.invoices(invoice_id)

create index idx_search_invoice_date
on raw_data.invoices(invoice_date)

create index idx_search_institution
on raw_data.invoices(institution)

create index idx_invoice_of_institution
on raw_data.invoices(invoice_id, institution)

create index idx_search_billed_to
on raw_data.invoices(institution, billed_to)

create index idx_search_billed_to_item
on raw_data.invoices(billed_to, item)

create index idx_search_bill
on raw_data.invoices(qty, billed_to)

create index idx_search_item_charge
on raw_data.invoices(
    institution
    , invoice_id
    , billed_to
    , item
)

请帮我解答这些疑问。我不太确定在哪里寻找更快的速度。

sql azure t-sql sql-server-2012 query-optimization
1个回答
0
投票

我相信仅使用一个窗口函数就可以大大简化所有查询

DENSE_RANK

WITH invoices AS (
    SELECT
        D.invoice_date,
        D.invoice_id,
        D.institution,
        C.institution_name,
        LOWER(TRIM(SUBSTRING(C.institution_name, 1, CHARINDEX('-', C.institution_name)-1))) as institution_name,
        DENSE_RANK() OVER(PARTITION BY institution ORDER BY invoice_date DESC) AS invoice_rank,
        D.billed_to,
        D.item,
        D.qty
    FROM raw_data.invoices
    LEFT JOIN catalogues.institutions C
    ON
        C.institution_code = D.institution
)

SELECT * FROM invoices
WHERE invoice_rank IN (1,2)
AND x.institution <> '2222'
AND billed_to NOT IN (
        'Jake Johnson', 'Bill Gates'
    )
AND qty > 15

我不确定您想通过

sum(case when x.qty > 15 then 1 else 0 end) >= 2
实现什么目的,如果您能澄清为什么
qty > 15
不适合,我可以调整上述查询以满足要求。

© www.soinside.com 2019 - 2024. All rights reserved.