设计一个should_throttle函数,该函数根据特定时间窗口限制请求

问题描述 投票:1回答:1

我在一个大型科技巨头的面对面采访中被问到这个算法问题。我无法很好地解决它,从那以后一直困扰着我。这是问题和我尝试解决它。

Question:
Design a should_throttle function that takes in a stream of requests.
Each request has the following format: {'client_id', 'msg_id', 'timestamp'}.
@return true if client has exceeded 1000 requests in one minute
@return false if client has NOT exceeded 1000 requests in one minute.

我试图解决的问题

def should_throttle(msg, seconds_limit=60, count_limit=1000):
    timestamp = msg['timestamp']
    client_id = msg['client_id']
    # msg_id = msg['id']

    global_dict = {} # key is timestamp, value is list of clients

    # we care only about last 60 seconds
    # so get all the keys that are within last 60 seconds OF THE CURRENT SEEN KEY
    left_range = timestamp - datetime.timedelta(seconds=seconds_limit)
    right_range = timestamp

    keys_to_delete = []
    client_count = 0
    for key in global_dict.keys(): # O(60)
        if key < left_range:
            keys_to_delete.append(key)
            continue
        if key >= left_range and key <= right_range:
            if client_id in global_dict[key]: # O(1)
                client_count += global_dict[key].count(client_id)
                continue

    # delete unnecessary keys to avoid being bloated
    for key in keys_to_delete: # O(1000)
        del global_dict[key]

    # add the current msg as seen
    if timestamp in global_dict:
        global_dict[timestamp].append(client_id) # we care only about client id and wethere its seen or not
    else:
        global_dict[timestamp] = [client_id]

    if client_count >= count_limit:
        return True # throttle it

    return False

我在一个时间窗口内跟踪每个请求的频率,并清理窗口外的字典中的条目。

我认为BigOh分析是:

n = max seconds window
m = max client limit

==> O(n * m)

有什么建议/改进吗?我真的觉得我错过了一个更好的方法。

python algorithm
1个回答
0
投票

您可以通过客户端创建collections.deque,将它们存储在defaultdict中。

然后,对于每个传入消息,来自deque的popleft所有时间戳太旧,添加新消息,并检查是否已超出限制。

from collections import defaultdict,deque


def should_throttle(messages, seconds_limit=60, count_limit=1000):  
    recent = defaultdict(deque)

    for message in messages:
        timestamp = message['timestamp']
        client = message['client_id']
        # remove the timestamps that are more than seconds_limit old
        try:
            while recent[client][0] < timestamp - seconds_limit:
                recent[client].popleft()
        except IndexError:
            # deque was empty
            pass

        # append the current one
        recent[client].append(timestamp)

        # just to show the current state of affairs...
        print(client, recent[client])

        # has this client exceeded his limit?
        if len(recent[client]) > count_limit:
            return True

    # we never exceeded the count limit
    return False

一些测试:

should_throttle([{'client_id': 1, 'msg_id':'', 'timestamp':1},
                {'client_id': 2, 'msg_id':'', 'timestamp':20},
                {'client_id': 1, 'msg_id':'', 'timestamp':21},
                {'client_id': 1, 'msg_id':'', 'timestamp':22},
                {'client_id': 2, 'msg_id':'', 'timestamp':23},
                {'client_id': 1, 'msg_id':'', 'timestamp':24},
               ],
               seconds_limit=4, count_limit=2)

输出:

1 deque([1])
2 deque([20])
1 deque([21])
1 deque([21, 22])
2 deque([20, 23])
1 deque([21, 22, 24])

True

有更高的限制:

should_throttle([{'client_id': 1, 'msg_id':'', 'timestamp':1},
                {'client_id': 2, 'msg_id':'', 'timestamp':20},
                {'client_id': 1, 'msg_id':'', 'timestamp':21},
                {'client_id': 1, 'msg_id':'', 'timestamp':22},
                {'client_id': 2, 'msg_id':'', 'timestamp':23},
                {'client_id': 1, 'msg_id':'', 'timestamp':24},
               ],
               seconds_limit=4, count_limit=4)

输出:

1 deque([1])
2 deque([20])
1 deque([21])
1 deque([21, 22])
2 deque([20, 23])
1 deque([21, 22, 24])

False
© www.soinside.com 2019 - 2024. All rights reserved.