我有以下情况。我有一个大小的数组(3,128,n)(其中n很大)。 (此数组代表一张图片)。我有一个超分辨率深度学习模型,它将(3,128,128)图像作为输入,并以更好的质量返回。我想用我的模型来应用整个画面。
我对这个问题的第一个解决方案是将我的数组拆分为大小数组(3,128,128)。然后我有一个方形图像列表,我可以将我的模型应用于每个方块,然后连接所有结果以获得一个新的(3,128,n)图像。此方法的问题在于模型在图像边缘上的表现不佳。
为了解决这个问题,我想到了另一种解决方案。我可以考虑可以从原始图像中提取的所有方形图像,而不是考虑非重叠的方形图像。我可以将所有这些图像传递给我的模型。然后,为了重建坐标点(a,b,c),我将考虑包含c的所有重建的方形图片,并取其平均值。我希望这个平均值能给予c靠近中心的正方形更多的权重。更具体 :
3*128*(n+2*127)
的新阵列(我们称之为A_pad)A_i = A_pad[:, :, i:i+128]
,A_i
的大小(3 * 128 * 128)并且可以输入我的模型,这将创建一个相同大小的新数组B_i
。B
,其定义如下:对于每个(x,y,z),B[x, y, z]
是128 B_i[x, y, z+127-i]
的平均值,因此z <= i <z + 128与权重1 + min(z + 127 -i, i-z)
。这对应于获取包含z的所有窗口的平均值,其权重与到最近边缘的距离成比例。我的问题是基于B的计算。鉴于我所描述的,我可以编写多个for
循环,这将产生正确的结果,但我担心它会很慢。我正在寻找尽可能快的使用numpy的解决方案。
这是一个示例实现,遵循您在“我的所需解决方案”部分中概述的步骤。它广泛使用np.lib.stride_tricks.as_strided
,乍一看似乎根本不显而易见;我为每个用法添加了详细的注释以便澄清。另请注意,在您的描述中,您使用z
来表示图像中的列位置,而在注释中我使用术语n-position
以符合通过n
的形状规范。
关于效率,这是否是胜利者并不明显。计算在numpy中发生,但表达式sliding_128 * weights
构建一个大数组(原始图像大小的128倍),然后沿着框架维度减少它。这肯定是它的成本,内存甚至可能是一个问题。循环可能在这个位置派上用场。
包含带有# [TEST]
前缀的注释的行被添加用于测试目的。具体地说,这意味着我们用1 / 128
覆盖最终帧总和的权重,以便最终恢复原始图像(因为也没有应用ML模型转换)。
import numpy as np
n = 640 # For example.
image = np.random.randint(0, 256, size=(3, 128, n))
print('image.shape: ', image.shape) # (3, 128, 640)
padded = np.pad(image, ((0, 0), (0, 0), (127, 127)), mode='edge')
print('padded.shape: ', padded.shape) # (3, 128, 894)
sliding = np.lib.stride_tricks.as_strided(
padded,
# Frames stored along first dimension; sliding across last dimension of `padded`.
shape=(padded.shape[-1]-128+1, 3, 128, 128),
# First dimension: Moving one frame ahead -> move across last dimension of `padded`.
# Remaining three dimensions: Move as within `padded`.
strides=(padded.strides[-1:] + padded.strides)
)
print('sliding.shape: ', sliding.shape) # (767, 3, 128, 128)
# Now at this part we would feed the frames `sliding` to the ML model,
# where the first dimension is the batch size.
# Assume the output is assigned to `sliding` again.
# Since we're not using an ML model here, we create a copy instead
# in order to update the strides of `sliding` with it's actual shape (as defined above).
sliding = sliding.copy()
sliding_128 = np.lib.stride_tricks.as_strided(
# Reverse last dimension since we want the last column from the first frame.
# Need to copy again because `[::-1]` creates a view with negative stride,
# but we want actual reversal to work with the strides below.
# (There's perhaps a smart way of adjusting the strides below in order to not make a copy here.)
sliding[:, :, :, ::-1].copy(),
# Second dimension corresponds to the 128 consecutive frames.
# Previous last dimension is dropped since we're selecting the
# column that corresponds to the current n-position.
shape=(128, n, 3, 128),
# First dimension (frame position): Move one frame and one column ahead
# (actually want to move one column less in `sliding` but since we reverted order of columns
# we need to move one ahead now) -> move across first dimension of `sliding` + last dimension of `sliding`.
# Second dimension (n-position): Moving one frame ahead -> move across first dimension of `sliding`.
# Remaining two dimensions: Move within frames (channel and row dimensions).
strides=((sliding.strides[0] + sliding.strides[-1],) + sliding.strides[:1] + sliding.strides[1:3])
)
print('sliding_128.shape: ', sliding_128.shape) # (128, 640, 3, 128)
# Weights are independent of the n-position -> we can precompute.
weights = 1 + np.concatenate([np.arange(64), np.arange(64)[::-1]])
weights = np.ones(shape=128) # [TEST] Assign weights for testing -> want to obtain the original image back.
weights = weights.astype(float) / weights.sum() # Normalize?
weights = weights[:, None, None, None] # Prepare for broadcasting.
weighted_image = np.moveaxis(np.sum(sliding_128 * weights, axis=0), 0, 2)
print('weighted_image.shape: ', weighted_image.shape) # (3, 128, 640)
assert np.array_equal(image, weighted_image.astype(int)) # [TEST]