如果我知道导致重复项的表和列,如何从查询结果中删除重复项? [重复]

问题描述 投票:0回答:1

我有一个巨大的查询,其中有很多

JOIN
。它正在产生重复项。

我正在使用下面的这种技术,我在SO上找到了它来识别重复项来自哪个表:

SELECT
   TableA = '----------', TableA.*,
   TableB = '----------', TableB.*
FROM ...

以下是数据示例:

TABLE_A     USER_ID             TABLE_B                 LOCATION                    USER_CODE   LOCATION_CODE   TABLE_C                     SCI_YEAR_CODE
USER        1092993811          COL_PATHS_SCIENCE_ED    University Of N. Maryland   NULL        ND              BIO_PATHS_SCIENCE_RESEARCH  2016_AAB
USER        1092993811          COL_PATHS_SCIENCE_ED    University Of N. Maryland   NULL        ND              BIO_PATHS_SCIENCE_RESEARCH  2017_RRT
USER        1092993811          COL_PATHS_SCIENCE_ED    University Of N. Maryland   NULL        ND              BIO_PATHS_SCIENCE_RESEARCH  2016_AAB
USER        1092993811          COL_PATHS_SCIENCE_ED    University Of N. Maryland   NULL        ND              BIO_PATHS_SCIENCE_RESEARCH  2017_RRT
USER        1092993811          COL_PATHS_SCIENCE_ED    California of College       NULL        MH              BIO_PATHS_SCIENCE_RESEARCH  2016_AAB
USER        1092993811          COL_PATHS_SCIENCE_ED    California of College       NULL        MH              BIO_PATHS_SCIENCE_RESEARCH  2017_RRT
USER        1092993811          COL_PATHS_SCIENCE_ED    California of College       NULL        MH              BIO_PATHS_SCIENCE_RESEARCH  2016_AAB
USER        1092993811          COL_PATHS_SCIENCE_ED    California of College       NULL        MH              BIO_PATHS_SCIENCE_RESEARCH  2017_RRT
USER        1092993811          COL_PATHS_SCIENCE_ED    New York City Tech          NULL        BS              BIO_PATHS_SCIENCE_RESEARCH  2016_AAB
USER        1092993811          COL_PATHS_SCIENCE_ED    New York City Tech          NULL        BS              BIO_PATHS_SCIENCE_RESEARCH  2017_RRT
USER        1092993811          COL_PATHS_SCIENCE_ED    New York City Tech          NULL        BS              BIO_PATHS_SCIENCE_RESEARCH  2016_AAB
USER        1092993811          COL_PATHS_SCIENCE_ED    New York City Tech          NULL        BS              BIO_PATHS_SCIENCE_RESEARCH  2017_RRT
USER        1092993811          COL_PATHS_SCIENCE_ED    New York City Tech          NULL        BS              BIO_PATHS_SCIENCE_RESEARCH  2016_AAB
USER        1092993811          COL_PATHS_SCIENCE_ED    New York City Tech          NULL        BS              BIO_PATHS_SCIENCE_RESEARCH  2017_RRT
USER        1092993811          COL_PATHS_SCIENCE_ED    New York City Tech          NULL        BS              BIO_PATHS_SCIENCE_RESEARCH  2016_AAB
USER        1092993811          COL_PATHS_SCIENCE_ED    New York City Tech          NULL        BS              BIO_PATHS_SCIENCE_RESEARCH  2017_RRT

您可以看到导致重复次数最多的表格列来自

TABLE_C
BIO_PATHS_SCIENCE_RESEARCH

对于

SCI_YEAR_CODE
,我只需要获取最近的日期,并且只需要以
SCI_YEAR_CODE
结尾的
RRT

有没有办法“清除”这些重复项?

sql t-sql join sql-server-2012
1个回答
1
投票

您可以使用 ROW_NUMBER() 为每个 USER_ID、LOCATION_CODE 和 TABLE_C 分区中的每一行分配序号,然后过滤结果以仅包含 RowNum = 1 的行:

   SELECT *
    FROM (
        SELECT
            ROW_NUMBER() OVER (PARTITION BY USER_ID, LOCATION_CODE, TABLE_C ORDER BY SCI_YEAR_CODE DESC) AS RowNum,
            TABLE_A.*,
            TABLE_B.*,
            TABLE_C.*
        FROM
            TABLE_A
        JOIN
            TABLE_B ON TABLE_A.USER_ID = TABLE_B.USER_ID
        JOIN
            TABLE_C ON TABLE_B.LOCATION_CODE = TABLE_C.LOCATION_CODE
    ) AS sub
    WHERE
        sub.RowNum = 1
        AND SCI_YEAR_CODE LIKE '%RRT';
© www.soinside.com 2019 - 2024. All rights reserved.