PyTorch：计算类似于 conv2d 的滑动“图像相似度”分数

Question

我有一个大图像和一个较小的“内核”图像。我想将内核与图像的每个部分进行比较（通过在图像上“滑动”内核）并检索“相似性张量”（指示内核与移动位置处的图像补丁的相似程度）。这非常类似于 conv2d，但我的内核是 RGB/HSV 图像，因此不是有效的卷积内核。

例如，如果我的内核图像是黑色的，则 conv2d 会在任何地方返回 0 作为与任何图像的相似度，但如果大图像在某个区域是黑色的，我希望相似度为 1。

我写了这个低效的方法来演示我想要的：

def tensor_kernel_similarity(image, kernel, stride = 1):

    assert len(kernel.shape) == 3
    assert len(image.shape) == 4

    similarity = torch.zeros((1,
                              int((image.shape[-2]-kernel.shape[-2])/stride)+1, 
                             int((image.shape[-1]-kernel.shape[-1])/stride)+1), dtype=torch.float, device=image.device)
    for b in range(0, similarity.shape[0]):
        for y in range(0, similarity.shape[1]):
            for x in range(0, similarity.shape[2]):
                image_patch = image[b,:, y*stride:y*stride+kernel.shape[1], x*stride:x*stride+kernel.shape[2]]
                
                #pad image patch to kernel size:
                if kernel.shape[2]-image_patch.shape[2] + kernel.shape[1]-image_patch.shape[1] > 0:
                    image_patch = F.pad(image_patch, (0, kernel.shape[2]-image_patch.shape[2], 0, kernel.shape[1]-image_patch.shape[1]), value=0)
                
                # calculate the diff between the image patch and the kernel (in the range [0,1])
                diff = image_patch.sub(kernel).abs().sum()/kernel.nelement()
                # similarity is the inverse of the diff
                sim = 1 - diff
                # store similarity in the output tensor                
                similarity[b,y,x] = sim
    return similarity

有没有办法使用 conv2d 来获得我想要的东西？或者有没有办法显着加快这段代码的速度？

Answer 1

您可以使用

torch.Tensor.unfold()

来获得显着的加速。使用

unfold()

，您可以在给定的

image

张量中生成滑动窗口视图，然后您可以将其与

kernel

进行比较，而无需显式

for

循环（请参阅中间张量形状的代码注释）：

def proposed(image, kernel, stride=1):
    # ^ image: [b, c, hi, wi], kernel: [c, hk, wk], stride: s
    patched = image.unfold(-2, kernel.shape[-2], stride)
    # ^ patched: [b, c, (hi-hk)//s+1, wi, hk]
    patched = patched.unfold(-2, kernel.shape[-1], stride)
    # ^ patched: [b, c, (hi-hk)//s+1, (wi-wk)//s+1, hk, wk]
    patched = patched.movedim(1, -3)
    # ^ patched: [b, (hi-hk)//s+1, (wi-wk)//s+1, c, hk, wk]
    sim = patched.sub(kernel).abs().sum(dim=(-3, -2, -1)).mul_(1 / kernel.nelement()).neg_().add_(1)
    # ^ sim: [b, (hi-hk)//s+1, (wi-wk)//s+1]
    return sim

这是我测试的方法：

from timeit import Timer

import torch
import torch.nn.functional as F

torch.manual_seed(42)

# TODO: Adjust parameters as necessary
device = torch.device("cpu")
stride = 2
b, c, hi, wi = 1, 3, 101, 101
hk, wk = 10, 9
assert b == 1  # In `tensor_kernel_similarity()`, the value of `similarity.shape[0]` is hard-coded to 1

# Create some random data
img = torch.rand(size=(b, c, hi, wi)).to(device)
knl = torch.rand(size=(c, hk, wk)).to(device)

given = tensor_kernel_similarity(img, knl, stride=stride)
own = proposed(img, knl, stride=stride)
assert torch.allclose(given, own)
print("Timing given:", Timer(lambda: tensor_kernel_similarity(img, knl, stride=stride)).timeit(100))
print("Timing proposed:", Timer(lambda: proposed(img, knl, stride=stride)).timeit(100))

这给了我关于CPU的信息：

Timing given: 9.2…
Timing proposed: 0.05…

在 GPU 上 (

device = torch.device("cuda:0")

):

Timing given: 18.8…
Timing proposed: 0.009…

一些注意事项：

您可能会注意到这里存在时间与内存的权衡。特别是，对于
```
patched.sub(kernel)
```
，
```
patched
```
的内存必须膨胀到滑动窗口视图的完整大小。如果您在这里遇到限制，请考虑您当前的方法和我的方法之间的混合。
我不确定您当前的方法和我的方法是否以相同的方式处理所有情况。我成功地测试了
```
stride in [1, 2, 3]
```
、
```
hi in [99, 100, 101]
```
、
```
wi in [99, 100, 101]
```
、
```
hk in [9, 10, 11]
```
、
```
wk in [9, 10, 11]
```
的所有组合，但也许我忽略了一些情况。

PyTorch：计算类似于 conv2d 的滑动“图像相似度”分数

问题描述投票：0回答：1

1个回答

最新问题

PyTorch：计算类似于 conv2d 的滑动“图像相似度”分数

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1