如何高效计算和处理 3D NumPy 数组中的 3x3x3 体素邻域？

Question

我正在开发一个逐体素处理 3D 图像的函数。对于每个体素，我计算体素值与其 3x3x3 邻域之间的差异，应用基于距离的缩放，并根据最大缩放差异确定标签。然而，我目前的实现速度很慢，我正在寻找优化它的方法。

import numpy as np 
from tqdm import tqdm import itertools

def create_label(image_input): 
    # Define neighborhood offsets and distances 
    locations = np.array([tuple([dx, dy, dz]) for dx, dy, dz in itertools.product(range(-1, 2), repeat=3)]) 
    euclidian_distances = np.linalg.norm(locations, axis=1) 
    euclidian_distances_inverse = np.divide(1, euclidian_distances, out=np.zeros_like(euclidian_distances), where=euclidian_distances != 0)
    
    image_input = image_input.astype(np.float32)
    label = np.zeros_like(image_input)
    image_shape = image_input.shape
    
    # Process each voxel (excluding the edges)
    for i in tqdm(range(1, image_shape[0] - 1)):
        for j in range(1, image_shape[1] - 1):
            for k in range(1, image_shape[2] - 1):
                # Extract 3x3x3 neighborhood values
                all_values_neighborhood = [image_input[loc[0] + i, loc[1] + j, loc[2] + k] for loc in locations]
    
                # Compute differences and scale by distances
                centre_minus_rest = image_input[i, j, k, None] - np.array(all_values_neighborhood)
                if np.all(centre_minus_rest < 0):  
                    label[i, j, k] = 13
                else:
                    centre_minus_rest_divided = centre_minus_rest * euclidian_distances_inverse
                    centre_minus_rest_divided[13] = -100  # Ignore the center value
                    class_label = np.argmax(centre_minus_rest_divided)
                    label[i, j, k] = class_label

return label

# example 3d array
image_input = np.random.rand(10, 10, 10).astype(np.float32)
# run the function
label = create_label(image_input) 
print(label)

Answer 1

您可以使用

sliding_window_view

，并对 for 循环进行矢量化。

一次尝试

def create_label2(image_input):
    # Define neighborhood offsets and distances 
    locations = np.array([tuple([dx, dy, dz]) for dx, dy, dz in itertools.product(range(-1, 2), repeat=3)])
    euclidian_distances = np.linalg.norm(locations, axis=1)
    euclidian_distances_inverse = np.divide(1, euclidian_distances, out=np.zeros_like(euclidian_distances), where=euclidian_distances != 0)

    image_input = image_input.astype(np.float32)
    label = np.zeros_like(image_input)
    image_shape = image_input.shape

    neighborhood = np.lib.stride_tricks.sliding_window_view(image_input, (3,3,3))

    # Process each voxel (excluding the edges)
    centre_minus_rest = image_input[1:-1, 1:-1, 1:-1, None,None,None] - neighborhood
    centre_minus_rest_divided = centre_minus_rest * euclidian_distances_inverse.reshape(1,1,1,3,3,3)
    centre_minus_rest_divided[:,:,:,1,1,1] = -100
    class_label = np.argmax(centre_minus_rest_divided.reshape(image_shape[0]-2, image_shape[1]-2, image_shape[2]-2,27), axis=3)
    label[1:-1,1:-1,1:-1] = np.where((centre_minus_rest<0).all(axis=(3,4,5)), 13, class_label)

    return label

在我的机器上，增益为 ×50

它的作用是围绕 1..P-1 中的每个 z、y、x 点构建一个由所有 3x3x3 立方体组成的 (P-2)×(H-2)×(W-2)×3×3×3 数组, 1..H-1, 1..W-1

从那里开始，就可以在单个向量化操作中计算您在

for

循环中所做的事情

我说的是“尝试”，这可能显得不太自信，因为那是相当消耗内存的。不是

sliding_window_view

本身，它不需要任何成本：它只是一个视图，在内存中没有自己的数据（内存是

image_input

的内存。所以

neighborhood[12,13,14,1,1,1]

与

neighborhood[13,14,15,0,0,0]

相同。不仅仅是相同的值：相同的数据，相同的内存位置，如果您

neighborhood[12,13,14,1,1,1]=999

，然后

print(neighborhood[13,14,15,0,0,0])

或。

print(neighborhood[13,12,13,0,2,2])

，它会打印您的

背影。

所以

neighborhood

不需要任何成本。它的形状（大约相当于初始图像体积的 27 倍）是“虚拟的”。但是，即使它没有使用实际内存，该数组的值也比图像中的值多出近 27 倍（每个值重复 27 次——“几乎”部分是因为边缘）。因此，一旦你对其进行一些计算，你就会得到这样一个数组，此时所有值都在内存中自己的位置。

换句话说，如果你从 100x100x100 的图像开始，它使用 1000000 个内存位置（4000000 字节，因为这些是

np.float32

），

neighborhood

不需要任何成本：同样的 4000000 字节使其具有近 27000000 个值。但是

centre_minus_rest

，然后后续的计算确实占用了“几乎”27000000 字节的内存。

因此，根据图像的大小，我们可能会通过这样做将 cpu 时间问题替换为内存问题。

妥协也是可能的。

您可以保留第一个循环（

for i in range...

），然后对所有 j 和 k 的计算进行向量化

for i in range(1, image_shape[0]-1):
        centre_minus_rest = image_input[i, 1:-1, 1:-1, None,None,None] - neighborhood[i-1]
        centre_minus_rest_divided = centre_minus_rest * euclidian_distances_inverse.reshape(1,1,3,3,3)
        centre_minus_rest_divided[:,:,1,1,1] = -100
        class_label = np.argmax(centre_minus_rest_divided.reshape(image_shape[1]-2, image_shape[2]-2,27), axis=2)
        label[i,1:-1,1:-1] = np.where((centre_minus_rest<0).all(axis=(2,3,4)), 13, class_label)

在我的机器上也同样快。然而，中间值 (

centre_minus_rest

, ...) 要小 3 倍。

请注意，这仍然不是理想的代码。我应该在 3x3x3 值子数组、欧几里得距离……或大小为 27 的一维数组之间一次性选择大小。（但是

sliding_window_view

需要一个 3x3x3 数组，否则它可以发挥其魔力 => 重塑为 (P-2,H-2,W-2,27) 将需要完整副本；相反，

.argmax

可以不能沿超过 1 个轴工作，并且需要长度为 27 的数组）

此外，

itertools

技巧非常Python化，但根本不是numpython化。类似

np.arange(3)[:,None,None]**2+np.arange(3)[None,:,None]**2+np.arange(3)[None,None,:]

之类的东西... 但这无论如何都超出了昂贵的循环

还必须注意的是，矢量化的一个不便之处在于，它有时会迫使您计算更多您需要的东西（但如果您计算两倍的东西，但速度快了 100 倍，那就是 x50 的时间增益）。

在这里，我的

np.where((centre_minus_rest<0).all(axis=(2,3,4)), 13, class_label)

意味着对于每个答案为 13 的值，我徒劳地计算了

class_label

。你的

for for for if

不是。

如何高效计算和处理 3D NumPy 数组中的 3x3x3 体素邻域？

问题描述投票：0回答：1

1个回答

最新问题

如何高效计算和处理 3D NumPy 数组中的 3x3x3 体素邻域？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1