我在一个大型科技巨头的面对面采访中被问到这个算法问题。我无法很好地解决它,从那以后一直困扰着我。这是问题和我尝试解决它。
Question:
Design a should_throttle function that takes in a stream of requests.
Each request has the following format: {'client_id', 'msg_id', 'timestamp'}.
@return true if client has exceeded 1000 requests in one minute
@return false if client has NOT exceeded 1000 requests in one minute.
我试图解决的问题
def should_throttle(msg, seconds_limit=60, count_limit=1000):
timestamp = msg['timestamp']
client_id = msg['client_id']
# msg_id = msg['id']
global_dict = {} # key is timestamp, value is list of clients
# we care only about last 60 seconds
# so get all the keys that are within last 60 seconds OF THE CURRENT SEEN KEY
left_range = timestamp - datetime.timedelta(seconds=seconds_limit)
right_range = timestamp
keys_to_delete = []
client_count = 0
for key in global_dict.keys(): # O(60)
if key < left_range:
keys_to_delete.append(key)
continue
if key >= left_range and key <= right_range:
if client_id in global_dict[key]: # O(1)
client_count += global_dict[key].count(client_id)
continue
# delete unnecessary keys to avoid being bloated
for key in keys_to_delete: # O(1000)
del global_dict[key]
# add the current msg as seen
if timestamp in global_dict:
global_dict[timestamp].append(client_id) # we care only about client id and wethere its seen or not
else:
global_dict[timestamp] = [client_id]
if client_count >= count_limit:
return True # throttle it
return False
我在一个时间窗口内跟踪每个请求的频率,并清理窗口外的字典中的条目。
我认为BigOh分析是:
n = max seconds window
m = max client limit
==> O(n * m)
有什么建议/改进吗?我真的觉得我错过了一个更好的方法。
您可以通过客户端创建collections.deque,将它们存储在defaultdict中。
然后,对于每个传入消息,来自deque的popleft
所有时间戳太旧,添加新消息,并检查是否已超出限制。
from collections import defaultdict,deque
def should_throttle(messages, seconds_limit=60, count_limit=1000):
recent = defaultdict(deque)
for message in messages:
timestamp = message['timestamp']
client = message['client_id']
# remove the timestamps that are more than seconds_limit old
try:
while recent[client][0] < timestamp - seconds_limit:
recent[client].popleft()
except IndexError:
# deque was empty
pass
# append the current one
recent[client].append(timestamp)
# just to show the current state of affairs...
print(client, recent[client])
# has this client exceeded his limit?
if len(recent[client]) > count_limit:
return True
# we never exceeded the count limit
return False
一些测试:
should_throttle([{'client_id': 1, 'msg_id':'', 'timestamp':1},
{'client_id': 2, 'msg_id':'', 'timestamp':20},
{'client_id': 1, 'msg_id':'', 'timestamp':21},
{'client_id': 1, 'msg_id':'', 'timestamp':22},
{'client_id': 2, 'msg_id':'', 'timestamp':23},
{'client_id': 1, 'msg_id':'', 'timestamp':24},
],
seconds_limit=4, count_limit=2)
输出:
1 deque([1])
2 deque([20])
1 deque([21])
1 deque([21, 22])
2 deque([20, 23])
1 deque([21, 22, 24])
True
有更高的限制:
should_throttle([{'client_id': 1, 'msg_id':'', 'timestamp':1},
{'client_id': 2, 'msg_id':'', 'timestamp':20},
{'client_id': 1, 'msg_id':'', 'timestamp':21},
{'client_id': 1, 'msg_id':'', 'timestamp':22},
{'client_id': 2, 'msg_id':'', 'timestamp':23},
{'client_id': 1, 'msg_id':'', 'timestamp':24},
],
seconds_limit=4, count_limit=4)
输出:
1 deque([1])
2 deque([20])
1 deque([21])
1 deque([21, 22])
2 deque([20, 23])
1 deque([21, 22, 24])
False