Python 多处理多锁比单锁慢

问题描述 投票:0回答:1

我正在用 Python 进行多处理实验。我编写了一些代码,需要同时修改 3 个不同的变量(一个 dict、一个 float 和一个 int),并在不同的进程中共享。我对锁定背后工作原理的理解告诉我,如果我有 3 个不同的共享变量,那么为每个变量分配一把锁会更有效。毕竟,为什么进程 2 仅仅因为进程 1 正在修改变量 B 就需要等待修改变量 A?对我来说,如果您需要锁定变量 B,那么 A 仍然可以被其他进程访问,这是有道理的。 我基于我正在编写的真实程序运行了下面的 2 个玩具示例,令我惊讶的是,使用单锁代码运行得更快!

单次锁定:2.1秒

import multiprocessing as mp
import numpy as np
import time

class ToyClass:
    def __init__(self, shared_a, shared_b):
        self.a = shared_a
        self.b = shared_b

    def update_a(self, key, n, lock):
        with lock:
            if key not in self.a:
                self.a[key] = np.zeros(4)
            self.a[key][n] += 1

    def update_b(self, lock):
        with lock:
            self.b.value = max(0.1, self.b.value - 0.01)

def run_episode(toy, counter, lock):
    key = np.random.randint(100)
    n = np.random.randint(4)
    toy.update_a(key, n, lock)
    toy.update_b(lock)
    with lock:
        counter.value += 1

if __name__ == "__main__":
    num_episodes = 1000
    num_processes = 4

    t0 = time.time()

    with mp.Manager() as manager:
        shared_a = manager.dict()
        shared_b = manager.Value('d', 0)
        counter = manager.Value('i', 0)

        toy = ToyClass(shared_a=shared_a, shared_b=shared_b)

        # Single lock
        lock = manager.Lock()

        pool = mp.Pool(processes=num_processes)

        for _ in range(num_episodes):
            pool.apply_async(run_episode, args=(toy, counter, lock))

        pool.close()
        pool.join()

    tf = time.time()

    print(f"Time to compute single lock: {tf - t0} seconds")

多重锁定:2.85秒!!

import multiprocessing as mp
import numpy as np
import time


class ToyClass:  ## Same definition as for single lock
    def __init__(self, shared_a, shared_b):
        self.a = shared_a
        self.b = shared_b

    def update_a(self, key, n, lock):
        with lock:
            if key not in self.a:
                self.a[key] = np.zeros(4)
            self.a[key][n] += 1 

    def update_b(self, lock):
        with lock:
            self.b.value = max(0.1, self.b.value - 0.01)

def run_episode(toy, counter, lock_a, lock_b, lock_count):
    key = np.random.randint(100)
    n = np.random.randint(4)
    toy.update_a(key, n, lock_a)
    toy.update_b(lock_b)
    with lock_count:
        counter.value += 1

if __name__ == "__main__":
    num_episodes = 1000
    num_processes = 4

    t0 = time.time()

    with mp.Manager() as manager:
        shared_a = manager.dict()
        shared_b = manager.Value('d', 0)
        counter = manager.Value('i', 0)

        toy = ToyClass(shared_a=shared_a, shared_b=shared_b)

        # 3 locks for 3 shared variables
        lock_a = manager.Lock()
        lock_b = manager.Lock()
        lock_count = manager.Lock()

        pool = mp.Pool(processes=num_processes)

        for _ in range(num_episodes):
            pool.apply_async(run_episode, args=(toy, counter, lock_a, lock_b, lock_count))

        pool.close()
        pool.join()

    tf = time.time()

    print(f"Time to compute multi-lock: {tf - t0} seconds")

我在这里缺少什么?在锁之间切换时是否存在超过任何潜在好处的计算开销?这些只是标志,怎么可能呢?

注意:我知道单进程/线程时代码运行速度快得多,但这是精确了解多处理缺点的实验的一部分。

python multithreading locking
1个回答
0
投票

这与锁无关,每次调用只需发送 3 个锁,而不是 1 个,这是传输开销的 3 倍。

要验证这一点,您可以测试

  1. 继续发送3把锁,但只使用其中1把,您将获得与使用3把锁相同的时间
  2. 将2个锁改为简单的
    Value
    对象,仍然与3个锁相同。

锁定部分在此不起作用,您只是一遍又一遍地发送锁,您可以通过在生成进程时使用初始化程序来避免这种情况。

lock_a = None
lock_b = None
lock_counter = None
def initialize_locks(val1,val2,val3):
    global lock_a, lock_b, lock_counter
    lock_a = val1
    lock_b = val2
    lock_counter = val3

...


pool = mp.Pool(processes=num_processes, initializer=initialize_locks, initargs=(lock_a, lock_b, lock_counter,))```
© www.soinside.com 2019 - 2024. All rights reserved.