python multiprocessing.Pool.map 不会加速部分函数的代码

Question

我正在尝试执行重采样任务，并且尝试使用多重处理来加速代码。我的目的是像这样加速代码：

import numpy as np
def jackknife_resampling(value, weight, label):
    # label is an integer array, marking the subsample
    def _resample(i):
        _value = value[label != i]
        _weight = weight[label != i]
        return np.average(_value, weights=_weight)
    return list(map(_resample, np.unique(label)))

要使用多处理，我需要将

map

替换为

Pool.map

，并将函数

_resample

移出。

from functools import partial
from multiprocessing import Pool
def _resample(value, weight, label, i):
    _value = value[label != i]
    _weight = weight[label != i]
    return np.average(_value, weights=_weight)
def jackknife_resampling_mp(value, weight, label, Npro):
    my_pool = Pool(Npro)
    return my_pool.map(partial(_resample, value, weight, label), np.unique(label))


if __name__ == '__main__':
    Nsamp = 1_0_000_000
    value = np.random.uniform(0, 1, size=Nsamp)
    weight = np.random.uniform(1, 2, size=Nsamp)
    label = np.random.randint(0, 200, size=Nsamp)
    t = time.time()
    jackknife_resampling_mp(value, weight, label, 40)
    print('time', time.time()-t)

因为

_resample

函数仍然需要原始值和权重，所以我使用

partial

来传递给它。但事实证明它并没有加快代码速度。为了证明

Pool.map

确实有效，我使用了全局变量并且效果很好。

def _resample(i):
    global value, weight, label
    _value = value[label != i]
    _weight = weight[label != i]
    return np.average(_value, weights=_weight)

def jackknife_resampling_mp(_value, _weight, _label, Npro):
    global value, weight, label
    value = _value
    weight = _weight
    label = _label
    # label is a integer array, marking the subsample
    my_pool = Pool(Npro)
    return my_pool.map(_resample, np.unique(label))

这可行，但我认为它很丑陋。 PS：我还尝试将整个重采样器扭曲成一个类，这与我使用的情况相同

partial

（我使用python 3.12，以便可以对实例方法进行pickle）。

这是为什么呢？还有更好的方法吗？

Answer 1

您正在尝试使用进程轮询来腌制大型数据集，这会由于部分而产生开销问题，在这里您可以使用两种方法来处理它。

使用共享内存来减少开销
使用 ThreadPoll 执行器执行内存限制任务

建议使用方法ThreadPoll Executor

python multiprocessing.Pool.map 不会加速部分函数的代码

问题描述投票：0回答：1

1个回答

最新问题

python multiprocessing.Pool.map 不会加速部分函数的代码

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1