我有一个大图像和一个较小的“内核”图像。我想将内核与图像的每个部分进行比较(通过在图像上“滑动”内核)并检索“相似性张量”(指示内核与移动位置处的图像补丁的相似程度)。这非常类似于 conv2d,但我的内核是 RGB/HSV 图像,因此不是有效的卷积内核。
例如,如果我的内核图像是黑色的,则 conv2d 会在任何地方返回 0 作为与任何图像的相似度,但如果大图像在某个区域是黑色的,我希望相似度为 1。
我写了这个低效的方法来演示我想要的:
def tensor_kernel_similarity(image, kernel, stride = 1):
assert len(kernel.shape) == 3
assert len(image.shape) == 4
similarity = torch.zeros((1,
int((image.shape[-2]-kernel.shape[-2])/stride)+1,
int((image.shape[-1]-kernel.shape[-1])/stride)+1), dtype=torch.float, device=image.device)
for b in range(0, similarity.shape[0]):
for y in range(0, similarity.shape[1]):
for x in range(0, similarity.shape[2]):
image_patch = image[b,:, y*stride:y*stride+kernel.shape[1], x*stride:x*stride+kernel.shape[2]]
#pad image patch to kernel size:
if kernel.shape[2]-image_patch.shape[2] + kernel.shape[1]-image_patch.shape[1] > 0:
image_patch = F.pad(image_patch, (0, kernel.shape[2]-image_patch.shape[2], 0, kernel.shape[1]-image_patch.shape[1]), value=0)
# calculate the diff between the image patch and the kernel (in the range [0,1])
diff = image_patch.sub(kernel).abs().sum()/kernel.nelement()
# similarity is the inverse of the diff
sim = 1 - diff
# store similarity in the output tensor
similarity[b,y,x] = sim
return similarity
有没有办法使用 conv2d 来获得我想要的东西?或者有没有办法显着加快这段代码的速度?
torch.Tensor.unfold()
来获得显着的加速。使用 unfold()
,您可以在给定的 image
张量中生成滑动窗口视图,然后您可以将其与 kernel
进行比较,而无需显式 for
循环(请参阅中间张量形状的代码注释):
def proposed(image, kernel, stride=1):
# ^ image: [b, c, hi, wi], kernel: [c, hk, wk], stride: s
patched = image.unfold(-2, kernel.shape[-2], stride)
# ^ patched: [b, c, (hi-hk)//s+1, wi, hk]
patched = patched.unfold(-2, kernel.shape[-1], stride)
# ^ patched: [b, c, (hi-hk)//s+1, (wi-wk)//s+1, hk, wk]
patched = patched.movedim(1, -3)
# ^ patched: [b, (hi-hk)//s+1, (wi-wk)//s+1, c, hk, wk]
sim = patched.sub(kernel).abs().sum(dim=(-3, -2, -1)).mul_(1 / kernel.nelement()).neg_().add_(1)
# ^ sim: [b, (hi-hk)//s+1, (wi-wk)//s+1]
return sim
这是我测试的方法:
from timeit import Timer
import torch
import torch.nn.functional as F
torch.manual_seed(42)
# TODO: Adjust parameters as necessary
device = torch.device("cpu")
stride = 2
b, c, hi, wi = 1, 3, 101, 101
hk, wk = 10, 9
assert b == 1 # In `tensor_kernel_similarity()`, the value of `similarity.shape[0]` is hard-coded to 1
# Create some random data
img = torch.rand(size=(b, c, hi, wi)).to(device)
knl = torch.rand(size=(c, hk, wk)).to(device)
given = tensor_kernel_similarity(img, knl, stride=stride)
own = proposed(img, knl, stride=stride)
assert torch.allclose(given, own)
print("Timing given:", Timer(lambda: tensor_kernel_similarity(img, knl, stride=stride)).timeit(100))
print("Timing proposed:", Timer(lambda: proposed(img, knl, stride=stride)).timeit(100))
这给了我关于CPU的信息:
Timing given: 9.2…
Timing proposed: 0.05…
在 GPU 上 (
device = torch.device("cuda:0")
):
Timing given: 18.8…
Timing proposed: 0.009…
一些注意事项:
patched.sub(kernel)
,patched
的内存必须膨胀到滑动窗口视图的完整大小。如果您在这里遇到限制,请考虑您当前的方法和我的方法之间的混合。stride in [1, 2, 3]
、hi in [99, 100, 101]
、wi in [99, 100, 101]
、hk in [9, 10, 11]
、wk in [9, 10, 11]
的所有组合,但也许我忽略了一些情况。