如何将 INNER JOIN 限制为单行更新

问题描述 投票:0回答:1

我正在尝试优化一些非常丑陋的查询。 我在这里有一个查询,它获取一个州的缩写,因为我们只处理缩写。

UPDATE [data_log] 
   SET [h_data] = COALESCE((SELECT TOP(1) [state_abbr] FROM [CityStateInfo] WHERE [state_long] = [h_data]), [h_data])
 WHERE [field] = 'MailingState'
   AND LEN([h_data]) > 3
   AND [h_data] IS NOT NULL

数据日志只是一个表格,我在其中跟踪需要进行或需要审查的更改。

CREATE TABLE [data_log] (
    [id]        int identity(1,1),
    [dataID]    bigint,
    [field]     varchar(128),
    [sf_data]   varchar(500),
    [h_data]    varchar(500),
    [score]     float,
    [action]    varchar(128)
);

INSERT INTO [Data_log] VALUES 
(3605013844, '[MailingCity]', 'Flat', 'Flatt', NULL, NULL),
(3605013844, '[MailingState]', 'KY', 'Kentucky', NULL, NULL),
(3605013844, '[MailingZIP]', '41301', '41301', NULL, NULL),
(1874281127, '[MailingCity]', 'EDMONTON', 'Edmonton', NULL, NULL),
(1874281127, '[MailingState]', 'AB', 'Alberta', NULL, NULL),
(1874281127, '[MailingZIP]', 'T6M 2K1', 'T6M 2K1', NULL, NULL),
(2077170855, '[MailingCity]', 'Van Buren Point', 'Van Buren Point', NULL, NULL),
(2077170855, '[MailingState]', 'NY', 'New York', NULL, NULL),
(2077170855, '[MailingZIP]', '14166', '14166', NULL, NULL),
(1874281127, '[MailingState]', 'PA', 'Ontario', NULL, NULL),
(1874281127, '[MailingState]', 'IL', 'Missouri', NULL, NULL)

[CityStateInfo] 有大量有关美国、加拿大、墨西哥和欧洲的信息。 它有 3,764,649 行。 它包含世界各地的每个城市/州/邮政编码组合以及其他信息。

CREATE TABLE [dbo].[CityStateInfo](
    [City] [varchar](255) NULL,
    [State_abbr] [varchar](10) NULL,
    [State_long] [varchar](50) NULL,
    [Zip] [varchar](20) NULL,
    [County] [varchar](50) NULL,
    [Country] [varchar](50) NULL,
    [Longitude] [varchar](15) NULL,
    [Latitude] [varchar](15) NULL,
    [StateFIPS] [varchar](10) NULL,
    [CountryFIPS] [varchar](10) NULL,
    [TimeZone] [int] NULL,
    [cleanCity] [varchar](255) NULL,
    [Country_abbr] [varchar](10) NULL,
    [foreignCity] [varchar](255) NULL,
    [foreignState] [varchar](255) NULL
)

INSERT INTO [CityStateInfo] VALUES
('AARON','KY','Kentucky','42602','RUSSELL','United States','-85.121708','36.751734','21','207','6','AARON','US',NULL,NULL),
('AARON','KY','Kentucky','42602','CLINTON','United States','-85.121708','36.751734','21','053','6','AARON','US',NULL,NULL),
('ADRIAN','MO','Missouri','64720','BATES','United States','-94.398772','38.433513','29','013','6','ADRIAN','US',NULL,NULL),
('ADVANCE','MO','Missouri','63730','CAPE GIRARDEAU','United States','-89.911055','37.058424','29','031','6','ADVANCE','US',NULL,NULL),
('SHIRLEY','NY','New York','11967','SUFFOLK','United States','-72.880184','40.794219','36','103','5','SHIRLEY','US',NULL,NULL),
('SHOKAN','NY','New York','12481','ULSTER','United States','-74.214799','41.982148','36','111','5','SHOKAN','US',NULL,NULL),
('KANATA','ON','Ontario','K2M 0A8',NULL,'Canada',NULL,NULL,NULL,NULL,'5','KANATA','CA',NULL,NULL),
('KANATA','ON','Ontario','K2M 0A9',NULL,'Canada',NULL,NULL,NULL,NULL,'5','KANATA','CA',NULL,NULL),
('EDMONTON','AB','Alberta','T6H 0J5',NULL,'Canada',NULL,NULL,NULL,NULL,'7','EDMONTON','CA',NULL,NULL),
('EDMONTON','AB','Alberta','T6H 0J6',NULL,'Canada',NULL,NULL,NULL,NULL,'7','EDMONTON','CA',NULL,NULL)

我以为我可以做这样的事情

    UPDATE [data_log]
       SET [data_log].[h_data] = c.[state_abbr]
      FROM [data_log] d
INNER JOIN [CityStateInfo] c
        ON d.[h_data] = c.[state_long]
     WHERE d.[field] = '[MailingState]'
       AND LEN([h_data]) > 3
       AND [h_data] IS NOT NULL

但是,当我使用类似的设置进行选择时,我会得到数百万行,因为我要查找的每个州都可能有数十甚至数千行。 虽然上面的查询似乎确实得到了我想要的东西,但我想确保我不会仅仅为了编辑几十行而调用数百万行,这将违背尝试清理查询的目的。

    SELECT * FROM [data_log] d
INNER JOIN [CityStateInfo] c
        ON d.[h_data] = c.[state_long]
     WHERE d.[field] = '[MailingState]'
       AND LEN([h_data]) > 3
       AND [h_data] IS NOT NULL

那么,如何更改 UPDATE 查询,使其仅与单行匹配进行编辑,而不是旋转超过所需的周期?

SQL 小提琴

sql sql-server sql-update
1个回答
0
投票

因为您只需要 CityStateInfo 中的 state_long 和 state_abbr:

    SELECT *
      FROM [data_log] d
INNER JOIN (
           select distinct csi.state_long
           , csi.state_abbr
           from [CityStateInfo] csi
           ) c
        ON d.[h_data] = c.[state_long]
     WHERE d.[field] = '[MailingState]'
       AND LEN([h_data]) > 3

...产生较小的输出,所以...

    UPDATE [data_log]
       SET [data_log].[h_data] = c.[state_abbr]
      FROM [data_log] d
INNER JOIN (
           select distinct csi.state_long
           , csi.state_abbr
           from [CityStateInfo] csi
           ) c
        ON d.[h_data] = c.[state_long]
     WHERE d.[field] = '[MailingState]'
       AND LEN([h_data]) > 3

...将涉及更少的数据。 但它还涉及一个额外的处理步骤。 您应该检查两种方法的性能,以确定哪种方法最适合您的环境。

© www.soinside.com 2019 - 2024. All rights reserved.