如何在Python中优化迭代NxM网格的计算

问题描述 投票:0回答:1

在Python中工作,我正在对NxM值网格进行一些物理计算,其中N从1变为3108,M从1变为2304(这对应于大图像)。我需要在这个空间的每个点计算一个值,总计约700万次计算。我目前的做法非常缓慢,我想知道是否有办法完成这项任务,而且不需要花费数小时......

我的第一种方法就是使用嵌套for循环,但这似乎是解决问题的最有效方法。我已经尝试过使用NumPy的nditer并分别遍历每个轴,但我读过它实际上并没有加快我的计算速度。我没有单独循环遍历每个轴,而是尝试制作一个三维阵列并在外轴上循环,如Brian的回答How can I, in python, iterate over multiple 2d lists at once, cleanly?所示。这是我的代码的当前状态:

import numpy as np
x,y = np.linspace(1,3108,num=3108),np.linspace(1,2304,num=2304) # x&y dimensions of image
X,Y = np.meshgrid(x,y,indexing='ij')
all_coords = np.dstack((X,Y)) # moves to 3-D
all_coords = all_coords.astype(int) # sets coords to int

作为参考,all_coords看起来像这样:

array([[[1.000e+00, 1.000e+00],
        [1.000e+00, 2.000e+00],
        [1.000e+00, 3.000e+00],
        ...,
        [1.000e+00, 2.302e+03],
        [1.000e+00, 2.303e+03],
        [1.000e+00, 2.304e+03]],

       [[2.000e+00, 1.000e+00],
        [2.000e+00, 2.000e+00],
        [2.000e+00, 3.000e+00],
        ...,
        [2.000e+00, 2.302e+03],
        [2.000e+00, 2.303e+03],
        [2.000e+00, 2.304e+03]],

等等。回到我的代码......

'''
- below is a function that does a calculation on the full grid using the distance between x0,y0 and each point on the grid.
- the function takes x0,y0 and returns the calculated values across the grid
'''
def do_calc(x0,y0):
    del_x, del_y = X-x0, Y-y0
    np.seterr(divide='ignore', invalid='ignore')
    dmx_ij = (del_x/((del_x**2)+(del_y**2))) # x component
    dmy_ij = (del_y/((del_x**2)+(del_y**2))) # y component
    return dmx_ij,dmy_ij

# now the actual loop

def do_loop():
    dmx,dmy = 0,0
    for pair in all_coords:
        for xi,yi in pair:
            DM = do_calc(xi,yi)
            dmx,dmy = dmx+DM[0],dmy+DM[1]
    return dmx,dmy

正如您可能看到的,这段代码需要花费相当长的时间才能运行...如果有任何方法可以修改我的代码,这样就不需要花费数小时才能完成,我将非常有兴趣知道如何做到这一点。在此先感谢您的帮助。

python performance numpy optimization iteration
1个回答
1
投票

这是一种在N=310, M=230上提供10,000倍加速的方法。由于该方法比原始代码更好地扩展,因此在整个问题规模上我预计会有超过一百万的因子。

该方法利用了问题的移位不变性。例如,del_x**2在每次调用do_calc时基本相同,所以我们只计算一次。

如果do_calc的输出在求和之前被加权,则问题不再是完全平移不变的,并且该方法不再起作用。然而,结果可以用线性卷积表示。在N=310, M=230,这仍然让我们的加速超过1,000倍。而且,再次,这将是更大的问题规模

原始问题的代码

import numpy as np

#N, M = 3108, 2304
N, M = 310, 230

### OP's code

x,y = np.linspace(1,N,num=N),np.linspace(1,M,num=M) # x&y dimensions of image
X,Y = np.meshgrid(x,y,indexing='ij')
all_coords = np.dstack((X,Y)) # moves to 3-D
all_coords = all_coords.astype(int) # sets coords to int

'''
- below is a function that does a calculation on the full grid using the distance between x0,y0 and each point on the grid.
- the function takes x0,y0 and returns the calculated values across the grid
'''
def do_calc(x0,y0):
    del_x, del_y = X-x0, Y-y0
    np.seterr(divide='ignore', invalid='ignore')
    dmx_ij = (del_x/((del_x**2)+(del_y**2))) # x component
    dmy_ij = (del_y/((del_x**2)+(del_y**2))) # y component
    return np.nan_to_num(dmx_ij), np.nan_to_num(dmy_ij)

# now the actual loop

def do_loop():
    dmx,dmy = 0,0
    for pair in all_coords:
        for xi,yi in pair:
            DM = do_calc(xi,yi)
            dmx,dmy = dmx+DM[0],dmy+DM[1]
    return dmx,dmy

from time import time

t = [time()]

### pp's code

x, y = np.ogrid[-N+1:N-1:2j*N - 1j, -M+1:M-1:2j*M - 1J]
den = x*x + y*y
den[N-1, M-1] = 1
xx = x / den
yy = y / den
for zz in xx, yy:
    zz[N:] -= zz[:N-1]
    zz[:, M:] -= zz[:, :M-1]
XX = xx.cumsum(0)[N-1:].cumsum(1)[:, M-1:]
YY = yy.cumsum(0)[N-1:].cumsum(1)[:, M-1:]
t.append(time())

### call OP's code for reference

X_OP, Y_OP = do_loop()
t.append(time())

# make sure results are equal

assert np.allclose(XX, X_OP)
assert np.allclose(YY, Y_OP)
print('pp {}\nOP {}'.format(*np.diff(t)))

样品运行:

pp 0.015251636505126953
OP 149.1642508506775

加权问题代码:

import numpy as np

#N, M = 3108, 2304
N, M = 310, 230

values = np.random.random((N, M))
x,y = np.linspace(1,N,num=N),np.linspace(1,M,num=M) # x&y dimensions of image
X,Y = np.meshgrid(x,y,indexing='ij')
all_coords = np.dstack((X,Y)) # moves to 3-D
all_coords = all_coords.astype(int) # sets coords to int

'''
- below is a function that does a calculation on the full grid using the distance between x0,y0 and each point on the grid.
- the function takes x0,y0 and returns the calculated values across the grid
'''
def do_calc(x0,y0, v):
    del_x, del_y = X-x0, Y-y0
    np.seterr(divide='ignore', invalid='ignore')
    dmx_ij = (del_x/((del_x**2)+(del_y**2))) # x component
    dmy_ij = (del_y/((del_x**2)+(del_y**2))) # y component
    return v*np.nan_to_num(dmx_ij), v*np.nan_to_num(dmy_ij)

# now the actual loop

def do_loop():
    dmx,dmy = 0,0
    for pair, vv in zip(all_coords, values):
        for (xi,yi), v in zip(pair, vv):
            DM = do_calc(xi,yi, v)
            dmx,dmy = dmx+DM[0],dmy+DM[1]
    return dmx,dmy

from time import time
from scipy import signal

t = [time()]
x, y = np.ogrid[-N+1:N-1:2j*N - 1j, -M+1:M-1:2j*M - 1J]
den = x*x + y*y
den[N-1, M-1] = 1
xx = x / den
yy = y / den
XX, YY = (signal.fftconvolve(zz, values, 'valid') for zz in (xx, yy))

t.append(time())
X_OP, Y_OP = do_loop()
t.append(time())
assert np.allclose(XX, X_OP)
assert np.allclose(YY, Y_OP)
print('pp {}\nOP {}'.format(*np.diff(t)))

样品运行:

pp 0.12683939933776855
OP 158.35225439071655
© www.soinside.com 2019 - 2024. All rights reserved.