随着额外的 cpu，python 多处理速度会变慢

Question

我正在尝试并行化应该令人尴尬地并行的代码，但我使用的进程越多，它似乎就越慢。这是一个最小（功能障碍）的示例：

import os
import time
import random
import multiprocessing
from multiprocessing import Pool, Manager, Process

import numpy as np
import pandas as pd


def pool_func(
    number: int,
    max_number: int
) -> dict:
    
    pid = str(multiprocessing.current_process().pid)
    
    print('[{:2d}/{:2d} {:s}] Starting ...'.format(number, max_number, pid))
    t0 = time.time()
    # # the following takes ~10 seconds on a separate node
    # for i in range(2):
    #     print('[{:d}] Passed loop {:d}/2...'.format(number, i+1))
    #     time.sleep(5)
    
    # the following takes ~3.3 seconds on a separate node
    n = 1000
    for _ in range(50):
        u = np.random.randn(n, n)
        v = np.linalg.inv(u)
    
    t1 = time.time()
    print('[{:2d}/{:2d} {:s}] Finished in {:.1f} seconds.'.format(number, max_number, pid, t1 - t0))
    return {}
    

if __name__ == "__main__":
    runs = []
    count = 0
    while count < 50:
        runs.append(
            (count, 50)
        )
        count += 1
    
    print(f"Number of runs to perform: {len(runs):d}")
    
    num_cpus = 4
    print(f"Running job with {num_cpus:d} CPUs in parallel ...")
    
    # with Pool(processes=num_cpus) as pool:
    with multiprocessing.get_context("spawn").Pool(processes=num_cpus) as pool:
        results = pool.starmap(pool_func, runs)
    
    print('Main process done.')

我想指出三个特点。首先，可以更改

num_cpus

来增加池中的工人数量。其次，我可以从默认的“fork”池更改为“spawn”方法，这似乎没有改变任何东西。最后，在 pool_func 内部，正在运行的进程可以是 CPU 密集型矩阵求逆，也可以是 CPU 不存在的等待函数。

当我使用等待函数时，进程大约在正确的时间内运行，每个进程大约 10 秒。当我使用矩阵求逆时，处理时间随着进程数量的增加而增加，近似方式如下：

1 CPU :  3 seconds  
2 CPUs:  4 seconds  
4 CPUs: 30 seconds  
8 CPUs: 95 seconds

这是上面脚本的部分输出，按原样运行：

Number of runs to perform: 50
Running job with 4 CPUs in parallel ...
[ 0/50 581194] Starting ...
[ 4/50 581193] Starting ...
[ 8/50 581192] Starting ...
[12/50 581191] Starting ...
[ 0/50 581194] Finished in 24.7 seconds.
[ 1/50 581194] Starting ...
[ 4/50 581193] Finished in 29.3 seconds.
[ 5/50 581193] Starting ...
[12/50 581191] Finished in 30.3 seconds.
[13/50 581191] Starting ...
[ 8/50 581192] Finished in 32.2 seconds.
[ 9/50 581192] Starting ...
[ 1/50 581194] Finished in 26.9 seconds.
[ 2/50 581194] Starting ...
[ 5/50 581193] Finished in 30.3 seconds.
[ 6/50 581193] Starting ...
[13/50 581191] Finished in 30.8 seconds.
[14/50 581191] Starting ...
[ 9/50 581192] Finished in 32.8 seconds.
[10/50 581192] Starting ...
...

进程 ID 对我来说看起来很独特。

显然，扩展存在一些问题，因为添加更多 CPU 会导致进程运行速度变慢。正在计时的进程中根本没有任何 I/O。这些都是我期望开箱即用的简单流程。我不知道为什么这没有像我想象的那样工作。为什么当我使用更多 CPU 时，该脚本的各个进程会花费更长的时间？

当我在我的 Macos 笔记本电脑上运行它时，它按预期工作。但在我可以访问的另一台远程 Linux 计算机上也存在类似的扩展问题。这可能是一个特定于平台的问题，但我会保留它，以防有人以前见过它并知道如何修复它。

Answer 1

您需要在函数内部调用 Multiprocessing.pool 并将其直接映射到 numpy 矩阵求逆。

但是，如果任务不够大，我不能保证它会加快速度。设置流程的独立成本可能会超过每个流程的单独执行成本。

“Embarassingly Parrellel”是用于 GPU 多处理的术语。目前python不支持原生gpu加速。

随着额外的 cpu，python 多处理速度会变慢

问题描述投票：0回答：1

1个回答

最新问题

随着额外的 cpu，python 多处理速度会变慢

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1