为大数据集速度索引匹配

问题描述 投票:0回答:4
输出表(WS_PS)的行约为150,000行,B列中的每个值都需要循环以填充其他列。

WS_PS示例B列B作为查找值,列E列,如果B中的一行不是空的。

查找表(WS_PRIIP)有〜50,000行

    WS_PRIIPS示例
  • 列G列对应于WS_P列B中的查找值,其值来自E.
  • 列。
附录1:循环穿过每一行并使用工作表函数-12分钟运行时

On Error Resume Next last_row_ps = ws_ps.UsedRange.Rows.Count For ps_row = 2 To last_row_ps ws_ps.Cells(ps_row, 5).Value = WorksheetFunction.IfError(WorksheetFunction.Index(ws_priips.Range("A:G"), _ WorksheetFunction.Match(ws_ps.Cells(ps_row, 2), ws_priips.Range("G:G"), 0), 5), "") Next ps_row On Error GoTo -1

附件2:如果满足条件,将工作表加载到阵列中,并写入工作表-15分钟运行时
  • last_row_ps = ws_ps.UsedRange.Rows.Count last_row_priips = ws_priips.UsedRange.Rows.Count ps_array = ws_ps.Range("A1:X" & last_row_ps).Value priips_array = ws_priips.Range("A1:AZ" & last_row_priips).Value For ps_row = 2 To UBound(ps_array, 1) For priips_row = 2 To UBound(priips_array, 1) If ps_array(ps_row, 2) <> "" Then If ps_array(ps_row, 2) = priips_array(priips_row, 7) Then ws_ps.Cells(ps_row, 5).Value = priips_array(priips_row, 5) GoTo SkipLoop End If Else GoTo SkipLoop End If Next priips_row SkipLoop: Next ps_row
i为1列实施了这些解决方案,但我需要将其应用于10。 有没有办法可以大幅度地加快速度,而无需诉诸于工作表或使用python?

将查找表的数据读取到2D数组中,然后以ISIN值作为键构建字典对象,而数组“行”数字作为值。使用字典从数组中提取匹配的行值,以填充填充的表格。如果您首先从输出数据集表中拉出“要填充”和ISIN列,请更快地使用字典和查找阵列填充它们,然后最后将它们写回输出表。基本上尝试避免进行任何细胞操作。

使用下面的代码,两个表包含150k行(随机排序),我在3-4秒的第一表中填写了两个列。

注意,如果您需要跨平台支持,则可以使用collection代替词典,该字典在Mac上不可用。

Sub Tester() Dim arrDest, arrSrc, ws As Worksheet, isin As String Dim dict As Object, r As Long, rMatch As Long Dim rngDest As Range 'both my tables are on one sheet for testing Set ws = ThisWorkbook.Worksheets("Sheet1") 'table to be filled Set rngDest = ws.Range("A1").CurrentRegion arrDest = rngDest.Value 'read to 2D array 'data used as lookup table, as an array arrSrc = ws.Range("I1").CurrentRegion.Value Set dict = CreateObject("scripting.dictionary") 'map isin to row number in lookup table array For r = 2 To UBound(arrSrc, 1) isin = arrSrc(r, 1) If Len(isin) > 0 Then dict(isin) = r 'assumes no duplicates in lookup table Next r 'loop the report table and try to match values in the lookup table For r = 2 To UBound(arrDest, 1) isin = arrDest(r, 1) If Len(isin) > 0 Then If dict.Exists(isin) Then 'have a match? rMatch = dict(isin) 'matched row arrDest(r, 3) = arrSrc(rMatch, 2) 'copy a couple of values arrDest(r, 5) = arrSrc(rMatch, 3) End If 'have match End If 'not zero-length Next r rngDest.Value = arrDest 'replace data with updated array End Sub
我的测试表(较短版):



added:

对于需要跨平台支持(Windows/Mac)的任何人,这是使用词典来代替词典的基本“行映射”方法。 用150k项目加载该集合的时间比加载相同大小的字典要长约10%,但是通过键检索实际上是使用集合约为5x
excel vba optimization index-match
4个回答
6
投票
。对于大量的键,收集性能量表似乎更好 - 对于400k项和读取,大约是。 2.2和0.7秒,但对于字典,大约是。 10.5和10秒(慢得多)

'Using a Collection like a dictionary for mapping column values ' to their position in a dataset Sub CollectionTest() Dim arr, r As Long, col As Collection, t, k As String, v Set col = New Collection 'Source range A1:A150000 filled with "Val_000001", "Val_000002", etc ' sorted randomly arr = Range("A1").CurrentRegion.Value t = Timer For r = 1 To UBound(arr, 1) k = arr(r, 1) 'Add the row number as value and cell content as key ' note your row keys should be unique If IsEmpty(KeyValue(col, k)) Then col.Add r, k Next r Debug.Print col.Count & " items" Debug.Print "Loaded row map in " & Timer - t t = Timer For r = 1 To 150000 k = "Val_" & Format(r, "000000") v = KeyValue(col, k) If r < 5 Then Debug.Print "Key " & k & " at row# " & v Next r Debug.Print "Retrieved values in " & Timer - t End Sub 'Retrieve value for key `k` from collection `col` ' Returns Empty if there's no such key Function KeyValue(col As Collection, k As String) On Error Resume Next 'ignore error if no match for `k` KeyValue = col.Item(k) End Function

dictionary是匹配钥匙值的最佳选择。

读:
分析员洞穴 -  excel vlookup vs索引匹配与sql vsvba


观察:ExcelVBA简介第39部分 - 词典test data

Sub UpdateWSPD(ws_ps As Worksheet, ws_priips As Worksheet) Const priips_IDColumn As Long = 7 Const ps_IDColumn As Long = 2 Const priips_ValueColumn As Long = 5 Const ps_ValueColumn As Long = 5 Const KeyPrefix As String = "Key" Dim psData As Variant, priipsData As Variant psData = ws_ps.Range("A1").CurrentRegion.Value priipsData = ws_priips.Range("A1").CurrentRegion.Value Dim Map As Object Set Map = CreateObject("Scripting.Dictionary") Dim Key As String Dim r As Long For r = 2 To UBound(priipsData) Key = KeyPrefix & priipsData(r, priips_IDColumn) Map(Key) = r Next Dim priipsRow As Long For r = 2 To UBound(ws_ps) Key = KeyPrefix & ws_ps(r, ps_IDColumn) If Map.Exists(Key) Then priipsRow = Map(Key) ws_ps(r, ps_ValueColumn) = priipsData(priipsRow, priips_ValueColumn) End If Next Application.ScreenUpdating = False ws_ps.Range("A1").CurrentRegion.Value = ws_ps Application.ScreenUpdating = True End Sub

power查询一个选项? 未经测试,因此您可能需要调整一两件事: let // Load 'ws_ps' and 'ws_priips' queries ps_query = Excel.CurrentWorkbook(){[Name="ws_ps"]}[Content], priips_query = Excel.CurrentWorkbook(){[Name="ws_priips"]}[Content], // Merge 'ps_query' with 'priips_query' using Column B and Column G merged_table = Table.NestedJoin(ps_query, "Column B", priips_query, "Column G", "priips_data", JoinKind.LeftOuter), // Expand the 5th column (E) from the merged data expanded_table = Table.ExpandTableColumn(merged_table, "priips_data", {"Column E"}), // Replace null values with an empty string result_table = Table.ReplaceValue(expanded_table, null, "", Replacer.ReplaceValue, {"Column E"}) in result_table



4
投票
):

Option Explicit Sub Way2() Dim arrDest, arrSrc, ws As Worksheet, rngDest As Range, d&, s&, r& Application.ScreenUpdating = False Set ws = ThisWorkbook.ActiveSheet Set rngDest = ws.Range("A1").CurrentRegion r = rngDest.Rows.Count [B2].Resize(r - 1, 1) = Application.Evaluate("SEQUENCE(" & (r - 1) & ")") rngDest.Sort Key1:=[A:A], Header:=xlYes arrDest = rngDest.Value arrSrc = ws.Range("I1").CurrentRegion.Value s = 2 For d = 2 To r While arrDest(d, 1) <> arrSrc(s, 1) s = s + 1 Wend arrDest(d, 3) = arrSrc(s, 2) arrDest(d, 5) = arrSrc(s, 3) Next rngDest.Value = arrDest rngDest.Sort Key1:=[B:B], Header:=xlYes [B2].Resize(r - 1, 1).ClearContents Application.ScreenUpdating = True End Sub

I在50000参考表和300000数据表上对其进行了测试。尽管有其他操作,但此方法的速度比使用字典

慢15%。预计参考表被排序(这是典型的),否则可以将其与数据表相同。
如果字典不适用,则该方法可能会有所帮助,例如。 g。如果需要比较范围。
limitation:数据表应仅包含合格的键。但可以解决。

1
投票

Public Declare PtrSafe Function GetTickCount Lib "kernel32.dll" () As Long Sub Anysub() Dim t As Long t = GetTickCount ' payload code here Debug.Print GetTickCount - t End Sub

没有密钥完整性的数据表的子例程:

0
投票

Sub Way2_NoIntegrity() Dim arrDest, arrSrc, ws As Worksheet, d&, s&, r&, m&, p&, rngDest As Range Application.ScreenUpdating = False Set ws = ThisWorkbook.ActiveSheet Set rngDest = ws.Range("A1").CurrentRegion r = rngDest.Rows.Count [B2].Resize(r - 1, 1) = Application.Evaluate("SEQUENCE(" & (r - 1) & ")") rngDest.Sort Key1:=[A:A], Header:=xlYes arrDest = rngDest.Value arrSrc = ws.Range("I1").CurrentRegion.Value p = UBound(arrSrc, 1) s = 2: m = 2 For d = 2 To r While arrDest(d, 1) <> arrSrc(s, 1) s = s + 1 If s > p Then d = d + 1 If d > r Then Exit For s = m End If Wend m = s arrDest(d, 3) = arrSrc(s, 2) arrDest(d, 5) = arrSrc(s, 3) Next rngDest.Value = arrDest rngDest.Sort Key1:=[B:B], Header:=xlYes [B2].Resize(r - 1, 1).ClearContents Application.ScreenUpdating = True End Sub

用于范围比较的子例程:

Sub Way2RangeComp() Dim arrDest, arrSrc, ws As Worksheet, d&, s&, r&, m&, p&, rngDest As Range Application.ScreenUpdating = False Set ws = ThisWorkbook.ActiveSheet Set rngDest = ws.Range("A1").CurrentRegion r = rngDest.Rows.Count [B2].Resize(r - 1, 1) = Application.Evaluate("SEQUENCE(" & (r - 1) & ")") rngDest.Sort Key1:=[A:A], Header:=xlYes arrDest = rngDest.Value arrSrc = ws.[I1].CurrentRegion.Value p = UBound(arrSrc, 1) s = 2: m = 2 For d = 2 To r While arrDest(d, 1) > arrSrc(s, 1) s = s + 1 If s > p Then d = d + 1 If d > r Then Exit For s = m End If Wend m = s arrDest(d, 3) = arrSrc(s, 2) arrDest(d, 5) = arrSrc(s, 3) Next rngDest.Value = arrDest rngDest.Sort Key1:=[B:B], Header:=xlYes [B2].Resize(r - 1, 1).ClearContents Application.ScreenUpdating = True End Sub enter image description here

更有效的效率(使用词典好20%!)对于没有密钥完整性的数据表的算法(当然,包括完整密钥完整性的情况):

Sub Way2_NoIntegrity_Advanced() Dim arrDest, arrSrc, arrInd, ws As Worksheet, rngDest As Range Dim d&, s&, r&, m&, p&, ind& Application.ScreenUpdating = False Set ws = ThisWorkbook.ActiveSheet : Set rngDest = Range(ws.[E2], ws.[A2].End(xlDown)) arrDest = rngDest.Value : r = UBound(arrDest, 1) arrInd = Application.SortBy(Application.Evaluate("SEQUENCE(" & r & ")"), _ rngDest.Columns(1).Value) arrSrc = ws.Range("I1").CurrentRegion.Value : p = UBound(arrSrc, 1) : s = 2: m = 2 For d = 1 To r ind = arrInd(d, 1) While arrDest(ind, 1) <> arrSrc(s, 1) s = s + 1 If s > p Then d = d + 1: ind = arrInd(d, 1) If d > r Then Exit For s = m End If Wend m = s : arrDest(ind, 3) = arrSrc(s, 2) : arrDest(ind, 5) = arrSrc(s, 3) Next rngDest.Value = arrDest Application.ScreenUpdating = True End Sub

最新问题
© www.soinside.com 2019 - 2025. All rights reserved.