如何平均所有包含框架的窗口？

Question

我有以下情况。我有一个大小的数组（3,128，n）（其中n很大）。（此数组代表一张图片）。我有一个超分辨率深度学习模型，它将（3,128,128）图像作为输入，并以更好的质量返回。我想用我的模型来应用整个画面。

我现有的解决方案

我对这个问题的第一个解决方案是将我的数组拆分为大小数组（3,128,128）。然后我有一个方形图像列表，我可以将我的模型应用于每个方块，然后连接所有结果以获得一个新的（3,128，n）图像。此方法的问题在于模型在图像边缘上的表现不佳。

我想要的解决方案

为了解决这个问题，我想到了另一种解决方案。我可以考虑可以从原始图像中提取的所有方形图像，而不是考虑非重叠的方形图像。我可以将所有这些图像传递给我的模型。然后，为了重建坐标点（a，b，c），我将考虑包含c的所有重建的方形图片，并取其平均值。我希望这个平均值能给予c靠近中心的正方形更多的权重。更具体：

我从3 * 128 * n数组开始（我们称之为A）。我在左侧和右侧填充，这给了我一个大小为3*128*(n+2*127)的新阵列（我们称之为A_pad）
对于范围（0，n + 127）中的i，让A_i = A_pad[:, :, i:i+128]，A_i的大小（3 * 128 * 128）并且可以输入我的模型，这将创建一个相同大小的新数组B_i。
现在我想要一个与A相同大小的新数组B，其定义如下：对于每个（x，y，z），B[x, y, z]是128 B_i[x, y, z+127-i]的平均值，因此z <= i <z + 128与权重1 + min(z + 127 -i, i-z)。这对应于获取包含z的所有窗口的平均值，其权重与到最近边缘的距离成比例。

我的问题是基于B的计算。鉴于我所描述的，我可以编写多个for循环，这将产生正确的结果，但我担心它会很慢。我正在寻找尽可能快的使用numpy的解决方案。

Answer 1

这是一个示例实现，遵循您在“我的所需解决方案”部分中概述的步骤。它广泛使用np.lib.stride_tricks.as_strided，乍一看似乎根本不显而易见;我为每个用法添加了详细的注释以便澄清。另请注意，在您的描述中，您使用z来表示图像中的列位置，而在注释中我使用术语n-position以符合通过n的形状规范。

关于效率，这是否是胜利者并不明显。计算在numpy中发生，但表达式sliding_128 * weights构建一个大数组（原始图像大小的128倍），然后沿着框架维度减少它。这肯定是它的成本，内存甚至可能是一个问题。循环可能在这个位置派上用场。

包含带有# [TEST]前缀的注释的行被添加用于测试目的。具体地说，这意味着我们用1 / 128覆盖最终帧总和的权重，以便最终恢复原始图像（因为也没有应用ML模型转换）。

import numpy as np

n = 640  # For example.
image = np.random.randint(0, 256, size=(3, 128, n))
print('image.shape: ', image.shape)  # (3, 128, 640)

padded = np.pad(image, ((0, 0), (0, 0), (127, 127)), mode='edge')
print('padded.shape: ', padded.shape)  # (3, 128, 894)

sliding = np.lib.stride_tricks.as_strided(
    padded,
    # Frames stored along first dimension; sliding across last dimension of `padded`.
    shape=(padded.shape[-1]-128+1, 3, 128, 128),
    # First dimension: Moving one frame ahead -> move across last dimension of `padded`.
    # Remaining three dimensions: Move as within `padded`.
    strides=(padded.strides[-1:] + padded.strides)
)
print('sliding.shape: ', sliding.shape)  # (767, 3, 128, 128)

# Now at this part we would feed the frames `sliding` to the ML model,
# where the first dimension is the batch size.
# Assume the output is assigned to `sliding` again.
# Since we're not using an ML model here, we create a copy instead
# in order to update the strides of `sliding` with it's actual shape (as defined above).
sliding = sliding.copy()

sliding_128 = np.lib.stride_tricks.as_strided(
    # Reverse last dimension since we want the last column from the first frame.
    # Need to copy again because `[::-1]` creates a view with negative stride,
    # but we want actual reversal to work with the strides below.
    # (There's perhaps a smart way of adjusting the strides below in order to not make a copy here.)
    sliding[:, :, :, ::-1].copy(),
    # Second dimension corresponds to the 128 consecutive frames.
    # Previous last dimension is dropped since we're selecting the
    # column that corresponds to the current n-position.
    shape=(128, n, 3, 128),
    # First dimension (frame position): Move one frame and one column ahead
    #     (actually want to move one column less in `sliding` but since we reverted order of columns
    #      we need to move one ahead now) -> move across first dimension of `sliding` + last dimension of `sliding`.
    # Second dimension (n-position): Moving one frame ahead -> move across first dimension of `sliding`.
    # Remaining two dimensions: Move within frames (channel and row dimensions).
    strides=((sliding.strides[0] + sliding.strides[-1],) + sliding.strides[:1] + sliding.strides[1:3])
)
print('sliding_128.shape: ', sliding_128.shape)  # (128, 640, 3, 128)

# Weights are independent of the n-position -> we can precompute.
weights = 1 + np.concatenate([np.arange(64), np.arange(64)[::-1]])
weights = np.ones(shape=128)  # [TEST] Assign weights for testing -> want to obtain the original image back.
weights = weights.astype(float) / weights.sum()  # Normalize?
weights = weights[:, None, None, None]  # Prepare for broadcasting.

weighted_image = np.moveaxis(np.sum(sliding_128 * weights, axis=0), 0, 2)
print('weighted_image.shape: ', weighted_image.shape)  # (3, 128, 640)

assert np.array_equal(image, weighted_image.astype(int))  # [TEST]

如何平均所有包含框架的窗口？

问题描述投票：1回答：1

我现有的解决方案

我想要的解决方案

1个回答

最新问题

如何平均所有包含框架的窗口？

问题描述 投票：1回答：1

我现有的解决方案

我想要的解决方案

1个回答

最新问题

问题描述投票：1回答：1