当比例因子不是整数时，下采样的面积插值如何工作？

Question

根据定义，面积插值只是按像素面积加权。

所以我想如果比例因子是1.5，那么输出像素00包含00的完整像素、01和10的一半、11的1/4。权重将为in_pixel_area/(1.5)^2

然而，事实似乎并非如此：

x = torch.tensor(
    [[3, 106, 107, 40, 148, 112, 254, 151],
    [62, 173, 91, 93, 33, 111, 139, 25],
    [99, 137, 80, 231, 101, 204, 74, 219],
    [240, 173, 85, 14, 40, 230, 160, 152],
    [230, 200, 177, 149, 173, 239, 103, 74],
    [19, 50, 209, 82, 241, 103, 3, 87],
    [252, 191, 55, 154, 171, 107, 6, 123],
    [7, 101, 168, 85, 115, 103, 32, 11]],
dtype=torch.float).unsqueeze(0).unsqueeze(0)
print(x.shape, x.sum())
for scale in [8/6, 2]:
    
    pixel_area = scale**2
    y = F.interpolate(x, scale_factor=1/scale, mode="area")
    print(y.shape, y, y.sum()*pixel_area)
    print((3 + 106*(scale-1) + 62*(scale-1) + 173*(scale-1)**2)/pixel_area)
    print((11 + 123*(scale-1) + 32*(scale-1) + 6*(scale-1)**2)/pixel_area)

输出为：

torch.Size([1, 1, 8, 8]) tensor(7707.)
torch.Size([1, 1, 6, 6]) tensor([[[[ 86.0000, 119.2500,  82.7500, 101.0000, 154.0000, 142.2500],
          [117.7500, 120.2500, 123.7500, 112.2500, 132.0000, 114.2500],
          [162.2500, 118.7500, 102.5000, 143.7500, 167.0000, 151.2500],
          [124.7500, 159.0000, 154.2500, 189.0000, 112.0000,  66.7500],
          [128.0000, 126.2500, 125.0000, 155.5000,  54.7500,  54.7500],
          [137.7500, 128.7500, 115.5000, 124.0000,  62.0000,  43.0000]]]]) tensor(7665.7778)
43.99999999999999
35.62499999999999
torch.Size([1, 1, 4, 4]) tensor([[[[ 86.0000,  82.7500, 101.0000, 142.2500],
          [162.2500, 102.5000, 143.7500, 151.2500],
          [124.7500, 154.2500, 189.0000,  66.7500],
          [137.7500, 115.5000, 124.0000,  43.0000]]]]) tensor(7707.)
86.0
43.0

我们可以看到如果scale = 2，那么它就可以正常工作。但是当scale = 8/6时，它会给出奇怪的结果。

首先

y.sum()*pixel_area

不等于

x.sum()

第二，我尝试通过权重直接计算像素值，它给出 44 而不是 86。第三，当比例不同时，我期望输出 00 像素有不同的结果，但显然 00 仍然是 86。为什么？

更新仔细观察，似乎当比例 = 8/6 时，它只是以 1x1 的步幅执行 2x2 内核平均值。但这不是违背了面积插值的定义吗？

Answer 1

使用

mode="area"

时，pytorch 使用自适应平均池操作来计算输出。您可以在这里找到相关代码

...
    if input.dim() == 4 and mode == "area":
        assert output_size is not None
        return adaptive_avg_pool2d(input, output_size)
...

您可以通过以下方式验证：

x = ...
scale_factor = 0.75
pool1 = F.interpolate(x, scale_factor=scale_factor, mode="area")
output_size = [int(i*scale_factor) for i in x.shape[-2:]]
pool2 = F.adaptive_avg_pool2d(x, output_size)
pool1 == pool2

自适应平均池将输入分成大致均匀大小的块，并计算每个块的简单平均值。没有像您描述的那样的像素加权。您可以在here查看自适应池的代码和索引代码here。

查看索引代码可能会有所帮助：

inline int64_t start_index(int64_t a, int64_t b, int64_t c) {
  return (a / b) * c + ((a % b) * c) / b;
}

inline int64_t end_index(int64_t a, int64_t b, int64_t c) {
  return 1 + ((a + 1) * c - 1) / b;
}

这里，

是大小为

的维度中的输出位置，映射到大小为

的输入。

以你的例子为例，

scale_factor = 1/(8/6) = 0.75

。输入的大小为

(..., 8, 8)

，因此输出的大小为

(..., 6, 6)

(

int(0.75*8) = 6

)。

您可以使用以下方法来计算特定输出元素的值：

def start_index(a, b, c):
    return (a * c) // b

def end_index(a, b, c):
    return ((a + 1) * c + b - 1) // b

input_height = 8
input_width = 8
output_height = 6
output_width = 6
out_row = 0
out_col = 1
h0 = start_index(out_row, output_height, input_height)
h1 = end_index(out_row, output_height, input_height)
w0 = start_index(out_col, output_width, input_width)
w1 = end_index(out_col, output_width, input_width)

kh = h1-h0
kw = w1-w0

x[:, :, h0:h1, w0:w1].sum() / (kh*kw)

还要注意，对于自适应池，步幅不是恒定的。例如：

for a in range(6):
    start = start_index(a, 6, 8)
    end = end_index(a, 6, 8)
    print(f"Position {a}: {start} -> {end}")

Position 0: 0 -> 2
Position 1: 1 -> 3
Position 2: 2 -> 4
Position 3: 4 -> 6 # note the jump here from 2 to 4
Position 4: 5 -> 7
Position 5: 6 -> 8

当比例因子不是整数时，下采样的面积插值如何工作？

问题描述投票：0回答：1

1个回答

最新问题

当比例因子不是整数时，下采样的面积插值如何工作？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1