Python 中的并行化

问题描述 投票:0回答:1

我需要解决 N 个独立约束的 LSQ 问题,并希望使用并行处理来实现(例如 N ~ 50k)。通常我使用 Matlab parfor,它非常简单。我想使用 Python 做同样的事情。

我正在使用以下代码,但没有成功(注意:我的问题相当复杂,但下面的代码总结了它,并且很容易理解):

import numpy as np
import multiprocessing

data = np.array([[1,2,3], [4,5,6], [7,8,9], [10,11,12]])

U = np.zeros((4, 3))
def dummyFUNCTION(i):
        X = data
        X[:, i] *= 3
        return X[:, i]

# This is what I have and what I want
for i in range (0,3):
    ui = dummyFUNCTION(i)
    U[:, i] = ui

print(U)

# This is my attempt to parallelize and is not working
# I am under IDLE Shel 3.12.1

U = np.zeros((4, 3))
with  multiprocessing.Pool() as pool:
        ui = [pool.apply_async(dummyFUNCTION, [i]) for i in range(0,3)]
        for idx, val in enumerate(ui):
                U[:, idx] = val.get()
       pool.close()    

print(U)
parallel-processing
1个回答
0
投票

根据原始数据的大小,我会考虑另一种方法,例如保存和从磁盘读取。如果您想在集群上运行,尤其如此。

首先我修改你的函数,以便不完全复制你的原始数据

def dummyFUNCTION(i):
        X = data[:, i]
        X *= 3
        return X

然后我使用以下函数并行运行它

from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm


def parallel_threads(fun, vec, pbar=True):
    with ThreadPoolExecutor() as executor:
        if pbar:
            results = list(tqdm(
                executor.map(fun, vec),
                total=len(vec)))
        else:
            results = list(
                executor.map(fun, vec))
    return results

最后我运行你的代码

import numpy as np

data = np.array([[1,2,3], [4,5,6], [7,8,9], [10,11,12]])

U = parallel_threads(dummyFUNCTION2, range(3))
U = np.array(U).T

U 是您所期望的

print(U)
[[ 3  6  9]
 [12 15 18]
 [21 24 27]
 [30 33 36]]

额外注意:考虑使用Dask来完成此任务https://dask.discourse.group/t/most-efficient-way-to-implement-custom-functions-on-a-column-like-mean/757 /5

© www.soinside.com 2019 - 2024. All rights reserved.