我需要在处理列表的每次迭代中获取列表中出现次数最多的 ID。它是一个包含正/负 ID 的巨大列表。如果 ID 为正,则增加该 ID 的计数;如果为负,则减少该 ID 的计数。 考虑输入列表如下:
input_list = [6,6,6,2,2,-6,-6,-6,-2,-2]
input_list[0] = 6 -> Count of ID 6:1 ->max count at this iteration is 1
input_list[1] = 6 -> Count of ID 6:2 ->max count at this iteration is 2
input_list[2] = 6 -> Count of ID 6:3 ->max count at this iteration is 3
input_list[3] = 2 -> Count of ID 6:3, Count of ID 2:1 ->max count at this iteration is 3
input_list[4] = 2 -> Count of ID 6:3, Count of ID 2:2 ->max count at this iteration is 3
input_list[5] = -6 -> Count of ID 6:2, Count of ID 2:2 ->max count at this iteration is 2
input_list[6] = -6 -> Count of ID 6:1, Count of ID 2:2 ->max count at this iteration is 1
and so on..
预期输出:
[1,2,3,3,3,2,2,2,1,0]
注:-6表示ID 6的计数减1。
def getMAxIDAfterEachUpdate(input_list):
res = []
dic = {}
for i in range(len(input_list)):
tmp = abs(input_list[i])
if tmp not in dic:
tmp = 1
else:
# ID is negative, we decrement its count
if input_list[i] < 0:
dic[tmp] -= 1
else: #its positive, increment its count
dic[tmp] += 1
res.append(max(dic.values())
return res
每次迭代时在字典中查找最大值的成本很高。有更好的方法找到结果吗?
使用简单的字典来存储每个ID的计数的方法的主要问题是字典值是无序的,因此在每次迭代中必须对字典值执行完整扫描以获得最大值,从而导致时间复杂度O(n x m),其中 n 是输入列表的大小,m 是唯一 ID 的数量。
由于我们知道给定 ID 的计数在每次迭代中只能递增或递减 1,因此优化算法的一种方法是将 ID 计数以及共享该计数的 ID 数量存储在 a 的节点中。改为双链表,每次迭代时将节点中存储的计数按升序排列,这样需要O(1)的时间复杂度来获取最大计数,并且该最大计数总是在最后一个节点,从而提高了总体时间复杂度为 O(n)。
其中
nodes
是将 ID 映射到节点的字典,counts
是双向链表,其中每个节点存储一对共享计数的 ID 的 count
和 number
,使用 的示例输入列表[6, 6, 6, 2, 2, -6, -6, -6, -2, -2]
说明每次迭代结束时的结果:
Iteration 1, ID = 6:
nodes = {6: node1}
counts = [node1(count=1, number=1)]
Iteration 2, ID = 6:
nodes = {6: node1}
counts = [node1(count=2, number=1)]
Iteration 3, ID = 6:
nodes = {6: node1}
counts = [node1(count=3, number=1)]
Iteration 4, ID = 2:
nodes = {6: node1, 2: node2}
counts = [node2(count=1, number=1), node1(count=3, number=1)]
Iteration 5, ID = 2:
nodes = {6: node1, 2: node2}
counts = [node2(count=2, number=1), node1(count=3, number=1)]
Iteration 6, ID = -6:
nodes = {6: node2, 2: node2}
counts = [node2(count=2, number=2)]
Iteration 7, ID = -6:
nodes = {6: node1, 2: node2}
counts = [node1(count=1, number=1), node2(count=2, number=1)]
Iteration 8, ID = -6:
nodes = {6: node1, 2: node2}
counts = [node1(count=0, number=1), node2(count=2, number=1)]
Iteration 9, ID = -2:
nodes = {6: node1, 2: node2}
counts = [node1(count=0, number=1), node2(count=1, number=1)]
Iteration 10, ID = -2:
nodes = {6: node1, 2: node1}
counts = [node1(count=0, number=2)]
llist
模块实现双链表的算法:
from llist import dllist
from collections import namedtuple
count_number = namedtuple('count_number', ('count', 'number'))
def max_count_per_update(ids):
max_counts = []
nodes = {}
counts = dllist()
for id in ids:
if not (node := nodes.get(key := abs(id))):
if counts and (first := counts.first).value.count == 0:
node = first
else:
node = counts.appendleft(count_number(0, 1))
new_count = (current_node := node).value.count + (-1 if id < 0 else 1)
next_node = node.prev if id < 0 else node.next
if next_node and (next_value := next_node.value).count == new_count:
(node := next_node).value = count_number(new_count, next_value.number + 1)
else:
insert = counts.insertbefore if id < 0 else counts.insertafter
node = insert(count_number(new_count, 1), node)
if new_number := (current_value := current_node.value).number - 1:
current_node.value = count_number(current_value.count, new_number)
else:
counts.remove(current_node)
nodes[key] = node
max_counts.append(counts.last.value.count)
return max_counts
这样:
input_list = [6, 6, 6, 2, 2, -6, -6, -6, -2, -2]
print(max_count_per_update(input_list))
输出:
[1, 2, 3, 3, 3, 2, 2, 2, 1, 0]