我有一张订单表,有 5000 个订单。我有一张顾客表,有 5000 名顾客。
有些订单的“CustomerID”字段值为空,我想用算法修复该字段。
目前,对于订单表中“CustomersID”处的每个空值,算法会考虑订单表中的三个字段(1.名称、2.国家/地区、3.集群),并循环遍历客户表以查找具有相同字段的匹配项在客户表中(1.名称,2.国家/地区,3.集群)。如果存在匹配,我将在订单表中添加“CustomerID”(替换“CustomersID”处的空值,否则我首先在客户表中创建一个新的客户记录,然后在订单表中添加“CustomerID”。
问题在于算法缓慢。当循环遍历订单表中具有空值的所有字段(最坏情况 5000)并尝试在客户表中为每个字段查找匹配项(如果没有匹配,最坏情况 5000)时,总时间复杂度变为二次。 N^2。
提前致谢! 附件是我的代码。第一种方法打开订单表(rss)的记录集。然后它循环遍历订单记录集以查找空值。第二种方法打开客户表(第一个)的记录集。然后循环查找订单表共享字段中的匹配项(1.名称 2.国家/地区 3.集群)。
Public Sub TransferNullValues() 'hard coded parameters are "Orders", "Customers"
Set coll = New Collection
coll.Add "CustomerID"
coll.Add "End_customer"
coll.Add "End_customer_country"
coll.Add "Cluster"
Dim sqlStr As String
sqlStr = "SELECT " & buildSql(coll) & " FROM Orders where Orders.CustomerID is null;"
coll.Remove (1) 'remove customer id
Dim rss As DAO.Recordset
Set rss = Basic.getRs(sqlStr)
Do While Not rss.EOF
Dim locals As Collection
Set locals = New Collection
Dim v As Variant
For Each v In coll
locals.Add (rss(v).value)
Next
Dim id As Integer
id = getCustID(locals)
rss.Edit
rss("CustomerID").value = id
rss.Update
rss.MoveNext
Loop
rss.Close
Set rss = Nothing
End Sub
Private Function getCustID(locals As Collection) As Integer
Dim sqlStr As String
sqlStr = "SELECT * FROM CUSTOMERS;"
Dim rst As DAO.Recordset
Set rst = Basic.getRs(sqlStr)
Dim trgt As Collection
Do While Not rst.EOF
Set trgt = New Collection
Dim v As Variant
For Each v In coll
trgt.Add (rst(v).value)
Next
Dim allExists As Boolean
allExists = True
For Each v In locals
If Not Basic.Exists(trgt, v) Then
allExists = False
Exit For
End If
Next
If allExists Then
getCustID = rst("CustomerID").value
'Debug.Print "customer exists in customers table"
Exit Function
End If
rst.MoveNext
Loop
rst.AddNew
Dim count As Integer
count = 1
For Each v In locals
rst.Fields(count).value = v
count = count + 1
Next
Dim id As Integer
id = rst("CustomerID")
rst.Update
rst.Close
Set rst = Nothing
getCustID = id
End Function
我想过如何降低时间复杂度,但没有找到解决方案。
您可以使用 UPDATE 查询更有效地完成此操作:
UPDATE Orders
INNER JOIN Customers ON
(Orders.name = Customers.name) AND
(Orders.country = Customers.country) AND
(Orders.cluster = Customers.cluster)
SET Orders.CustomerId = Customers.CustomerId
WHERE Orders.CustomerId Is Null
JOIN 比 O(n2) 更高效。我的猜测是 O(n·log(n))。