在具有强大马力(64 GB 内存,16 个内核)的数据库服务器上,对具有 4000 万行的表执行合并需要花费许多小时。
我通过
SqlBulkCopy
将数据加载到临时表中,然后通过 MERGE
加载到目标表中。 MERGE
分批执行,以尽量减少对 TempDb
的影响。
我的发言要点是:
DECLARE @RowID int = 0,
@RowCount int,
@Batches int = 0,
@BatchSize int = 10000
SELECT @RowCount = COUNT(1)
FROM [someStagingTable]
WHILE @RowID <= @RowCount
BEGIN
MERGE INTO [someTargetTable] AS Target
USING (SELECT * FROM [someStagingTable]
WHERE ID BETWEEN @RowID AND @RowID + @BatchSize - 1) AS Source
ON Target.AccountNumber = Source.AccountNumber
WHEN MATCHED THEN
UPDATE
SET ...
WHEN NOT MATCHED BY TARGET THEN
INSERT ...
SET @RowID = @RowID + @BatchSize
SET @Batches = @Batches + 1
COMMIT
END
关于索引:
[someStagingTable].ID
是具有聚集索引的 int
标识列[someTargetTable].AccountNumber
已索引在这种特殊情况下,尽管
[someTargetTable].AccountNumber
上存在明显的索引,但 MERGE
语句需要索引提示:
MERGE INTO [someTargetTable] WITH (INDEX=IX_someTargetTableAccountNumber) AS Target
USING (SELECT * FROM [someStagingTable] WHERE ID BETWEEN @RowID AND @RowID + @BatchSize - 1) AS Source
ON Target.AccountNumber = Source.AccountNumber
在检查 SQL 执行计划时这一点变得很明显,该计划以每批中的
TABLE SCAN
的 [someTargetTable]
开始。添加索引提示将每个批次的执行时间从约 300 秒减少到约 1 秒。执行计划从 TABLE SCAN
转变为 INDEX SEEK
,读取的行数从 40M 减少到 10K(批量大小)。