PyTorch:计算类似于 conv2d 的滑动“图像相似度”分数

问题描述 投票:0回答:1

我有一个大图像和一个较小的“内核”图像。我想将内核与图像的每个部分进行比较(通过在图像上“滑动”内核)并检索“相似性张量”(指示内核与移动位置处的图像补丁的相似程度)。这非常类似于 conv2d,但我的内核是 RGB/HSV 图像,因此不是有效的卷积内核。

例如,如果我的内核图像是黑色的,则 conv2d 会在任何地方返回 0 作为与任何图像的相似度,但如果大图像在某个区域是黑色的,我希望相似度为 1。

我写了这个低效的方法来演示我想要的:

def tensor_kernel_similarity(image, kernel, stride = 1):

    assert len(kernel.shape) == 3
    assert len(image.shape) == 4

    similarity = torch.zeros((1,
                              int((image.shape[-2]-kernel.shape[-2])/stride)+1, 
                             int((image.shape[-1]-kernel.shape[-1])/stride)+1), dtype=torch.float, device=image.device)
    for b in range(0, similarity.shape[0]):
        for y in range(0, similarity.shape[1]):
            for x in range(0, similarity.shape[2]):
                image_patch = image[b,:, y*stride:y*stride+kernel.shape[1], x*stride:x*stride+kernel.shape[2]]
                
                #pad image patch to kernel size:
                if kernel.shape[2]-image_patch.shape[2] + kernel.shape[1]-image_patch.shape[1] > 0:
                    image_patch = F.pad(image_patch, (0, kernel.shape[2]-image_patch.shape[2], 0, kernel.shape[1]-image_patch.shape[1]), value=0)
                
                # calculate the diff between the image patch and the kernel (in the range [0,1])
                diff = image_patch.sub(kernel).abs().sum()/kernel.nelement()
                # similarity is the inverse of the diff
                sim = 1 - diff
                # store similarity in the output tensor                
                similarity[b,y,x] = sim
    return similarity

有没有办法使用 conv2d 来获得我想要的东西?或者有没有办法显着加快这段代码的速度?

pytorch convolution
1个回答
0
投票

您可以使用

torch.Tensor.unfold()
来获得显着的加速。使用
unfold()
,您可以在给定的
image
张量中生成滑动窗口视图,然后您可以将其与
kernel
进行比较,而无需显式
for
循环(请参阅中间张量形状的代码注释):

def proposed(image, kernel, stride=1):
    # ^ image: [b, c, hi, wi], kernel: [c, hk, wk], stride: s
    patched = image.unfold(-2, kernel.shape[-2], stride)
    # ^ patched: [b, c, (hi-hk)//s+1, wi, hk]
    patched = patched.unfold(-2, kernel.shape[-1], stride)
    # ^ patched: [b, c, (hi-hk)//s+1, (wi-wk)//s+1, hk, wk]
    patched = patched.movedim(1, -3)
    # ^ patched: [b, (hi-hk)//s+1, (wi-wk)//s+1, c, hk, wk]
    sim = patched.sub(kernel).abs().sum(dim=(-3, -2, -1)).mul_(1 / kernel.nelement()).neg_().add_(1)
    # ^ sim: [b, (hi-hk)//s+1, (wi-wk)//s+1]
    return sim

这是我测试的方法:

from timeit import Timer

import torch
import torch.nn.functional as F

torch.manual_seed(42)

# TODO: Adjust parameters as necessary
device = torch.device("cpu")
stride = 2
b, c, hi, wi = 1, 3, 101, 101
hk, wk = 10, 9
assert b == 1  # In `tensor_kernel_similarity()`, the value of `similarity.shape[0]` is hard-coded to 1

# Create some random data
img = torch.rand(size=(b, c, hi, wi)).to(device)
knl = torch.rand(size=(c, hk, wk)).to(device)

given = tensor_kernel_similarity(img, knl, stride=stride)
own = proposed(img, knl, stride=stride)
assert torch.allclose(given, own)
print("Timing given:", Timer(lambda: tensor_kernel_similarity(img, knl, stride=stride)).timeit(100))
print("Timing proposed:", Timer(lambda: proposed(img, knl, stride=stride)).timeit(100))

这给了我关于CPU的信息:

Timing given: 9.2…
Timing proposed: 0.05…

在 GPU 上 (

device = torch.device("cuda:0")
):

Timing given: 18.8…
Timing proposed: 0.009…

一些注意事项:

  • 您可能会注意到这里存在时间与内存的权衡。特别是,对于
    patched.sub(kernel)
    patched
    的内存必须膨胀到滑动窗口视图的完整大小。如果您在这里遇到限制,请考虑您当前的方法和我的方法之间的混合。
  • 我不确定您当前的方法和我的方法是否以相同的方式处理所有情况。我成功地测试了
    stride in [1, 2, 3]
    hi in [99, 100, 101]
    wi in [99, 100, 101]
    hk in [9, 10, 11]
    wk in [9, 10, 11]
    的所有组合,但也许我忽略了一些情况。
© www.soinside.com 2019 - 2024. All rights reserved.