python中的多处理模块和修改共享的全局变量

问题描述 投票:1回答:1

我写了一个小python程序,看看我是否理解全局变量如何传递给“子”进程。

import time
import random

shared_var = range(12)

def f(x):
    global shared_var
    time.sleep(1+random.random())
    shared_var[x] = 100
    print x, multiprocessing.current_process(), shared_var
    return x*x

if __name__ == '__main__':
    pool = multiprocessing.Pool(4)
    results = pool.map(f, range(8))
    print results
    print shared_var

当我跑它时,我得到了

3 <Process(PoolWorker-4, started daemon)> [0, 1, 2, 100, 4, 5, 6, 7, 8, 9, 10, 11]
0 <Process(PoolWorker-1, started daemon)> [100, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
2 <Process(PoolWorker-3, started daemon)> [0, 1, 100, 3, 4, 5, 6, 7, 8, 9, 10, 11]
1 <Process(PoolWorker-2, started daemon)> [0, 100, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
4 <Process(PoolWorker-4, started daemon)> [0, 1, 2, 100, 100, 5, 6, 7, 8, 9, 10, 11]
5 <Process(PoolWorker-1, started daemon)> [100, 1, 2, 3, 4, 100, 6, 7, 8, 9, 10, 11]
6 <Process(PoolWorker-3, started daemon)> [0, 1, 100, 3, 4, 5, 100, 7, 8, 9, 10, 11]
7 <Process(PoolWorker-2, started daemon)> [0, 100, 2, 3, 4, 5, 6, 100, 8, 9, 10, 11]
[0, 1, 4, 9, 16, 25, 36, 49]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

这是合乎逻辑的,因为子进程修改了全局变量,因此写入时复制机制使得当子进程修改全局变量时,它会被复制,因此任何更改只能在生成的进程中可见。

令我惊讶的是,当我修改代码以打印变量的标识符时:

import multiprocessing
import time
import random

shared_var = range(12)

def f(x):
    global shared_var
    time.sleep(1+random.random())
    shared_var[x] = 100
    print x, multiprocessing.current_process(), shared_var, id(shared_var)
    return x*x

if __name__ == '__main__':
    pool = multiprocessing.Pool(4)
    results = pool.map(f, range(8))
    print results
    print shared_var, id(shared_var)

得到了:

3 <Process(PoolWorker-4, started daemon)> [0, 1, 2, 100, 4, 5, 6, 7, 8, 9, 10, 11] 4504973968
0 <Process(PoolWorker-1, started daemon)> [100, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] 4504973968
1 <Process(PoolWorker-2, started daemon)> [0, 100, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] 4504973968
2 <Process(PoolWorker-3, started daemon)> [0, 1, 100, 3, 4, 5, 6, 7, 8, 9, 10, 11] 4504973968
6 <Process(PoolWorker-2, started daemon)> [0, 100, 2, 3, 4, 5, 100, 7, 8, 9, 10, 11] 4504973968
7 <Process(PoolWorker-3, started daemon)> [0, 1, 100, 3, 4, 5, 6, 100, 8, 9, 10, 11] 4504973968
4 <Process(PoolWorker-4, started daemon)> [0, 1, 2, 100, 100, 5, 6, 7, 8, 9, 10, 11] 4504973968
5 <Process(PoolWorker-1, started daemon)> [100, 1, 2, 3, 4, 100, 6, 7, 8, 9, 10, 11] 4504973968
[0, 1, 4, 9, 16, 25, 36, 49]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] 4504973968

所有变量的标识符(在主线程和生成的进程中)都是相同的,而我期望每个进程的副本...

有谁知道为什么我得到这些结果?还有一些关于multiprocessing如何处理由创建的Processes读/写的全局变量的参考文献会很棒。谢谢!

python multiprocessing global-variables
1个回答
1
投票

我认为对记忆存在一些困惑。您不使用多线程,而是使用多处理,因此每个工作程序都在一个单独的进程中运行,具有自己的虚拟内存空间。因此,每个过程从一开始就有自己的shared_var副本。这是在每次调用f(x)时被修改的内容,使__main__中的实际变量不受影响。

您可以查看the docs有关在进程之间共享内存的章节,例如使用multiprocessing.Array

我不是100%确定为什么地址保持不变,但我认为由于每个新的子进程都是通过分叉主进程并复制其内存布局而产生的,因此虚拟内存中的地址对于每个子进程保持不变。物理内存地址当然是不同的。这就是为什么你看到相同的id,但不同的价值观。

© www.soinside.com 2019 - 2024. All rights reserved.