为什么要在空间批量归一化中转置

Question

我试图编写一个名为

spatial_batchnorm_forward

的函数，用于卷积神经网络。在此函数中，我想重用

batchnorm_foward

函数，该函数是为全连接网络中的 (N, D) 形输入实现的。以下是正确的实现。

def spatial_batchnorm_forward(x, gamma, beta, bn_param):
    """Computes the forward pass for spatial batch normalization.
    """
    out, cache = None, None

    N, C, H, W = x.shape
    x_ = x.transpose(0,2,3,1).reshape(N*H*W, C)
    out_, cache = batchnorm_forward(x_, gamma, beta, bn_param)
    out = out_.reshape(N, H, W, C).transpose(0,3,1,2)

    return out, cache

但一开始我是这样写的：

def spatial_batchnorm_forward(x, gamma, beta, bn_param):
    """Computes the forward pass for spatial batch normalization.
    """
    out, cache = None, None

    N, C, H, W = x.shape
    x_ = x.reshape(-1, C)
    out_, cache = batchnorm_forward(x_, gamma, beta, bn_param)
    out = out_.reshape(N, C, H, W)

    return out, cache

此代码可以运行，这意味着这些尺寸匹配。但输出与上面略有不同。我想知道这是怎么回事。非常感谢您的耐心和帮助！！！

我猜问题出在reshape函数上，所以我阅读了文档。

numpy.reshape(a, newshape, order='C')[source]
Gives a new shape to an array without changing its data.

Parameters
aarray_like
Array to be reshaped.

newshapeint or tuple of ints
The new shape should be compatible with the original shape. If an integer, then the result will be a 1-D array of that length. One shape dimension can be -1. In this case, the value is inferred from the length of the array and remaining dimensions.

order{‘C’, ‘F’, ‘A’}, optional
Read the elements of a using this index order, and place the elements into the reshaped array using this index order. ‘C’ means to read / write the elements using C-like index order, with the last axis index changing fastest, back to the first axis index changing slowest. ‘F’ means to read / write the elements using Fortran-like index order, with the first index changing fastest, and the last index changing slowest. Note that the ‘C’ and ‘F’ options take no account of the memory layout of the underlying array, and only refer to the order of indexing. ‘A’ means to read / write the elements in Fortran-like index order if a is Fortran contiguous in memory, C-like order otherwise.

Returns
reshaped_arrayndarray
This will be a new view object if possible; otherwise, it will be a copy. Note there is no guarantee of the memory layout (C- or Fortran- contiguous) of the returned array.

但我还是不明白这是怎么回事。

Answer 1

让我们通过一个例子来更好地理解这一点。假设我们有一个 NumPy 数组，如下所示：

a = np.arange(2*3*2*4).reshape((2,3,2,4))

该数组代表 2 个具有 3 个通道的图像

[[[[ 0  1  2  3]
   [ 4  5  6  7]]

  [[ 8  9 10 11]
   [12 13 14 15]]

  [[16 17 18 19]
   [20 21 22 23]]]


 [[[24 25 26 27]
   [28 29 30 31]]

  [[32 33 34 35]
   [36 37 38 39]]

  [[40 41 42 43]
   [44 45 46 47]]]]

为了实现空间批量归一化，与普通批量归一化类似，我们需要以 2D（NxHxW，C）表示图像批量（N，C，H，W），其中每个数据点对应于 3 个通道的 3 个像素值。特征数对应于图像中的通道数，行数对应于批次中的像素总数，即 N x H x W。

因此，我们希望每一列都由来自所有图像通道的像素值的一维数组表示。这相当于将每个图像的每个通道中的像素值展平，如下所示：

b = a.reshape(2,3,-1)
print(b)

[[[ 0  1  2  3  4  5  6  7]
  [ 8  9 10 11 12 13 14 15]
  [16 17 18 19 20 21 22 23]]

 [[24 25 26 27 28 29 30 31]
  [32 33 34 35 36 37 38 39]
  [40 41 42 43 44 45 46 47]]]

接下来，我们沿着展平的通道堆叠批次中的图像：

c = b[0]
for i in range(1, 2):
    c  = np.hstack((c, b[i]))
c.T

array([[ 0,  8, 16],
   [ 1,  9, 17],
   [ 2, 10, 18],
   [ 3, 11, 19],
   [ 4, 12, 20],
   [ 5, 13, 21],
   [ 6, 14, 22],
   [ 7, 15, 23],
   [24, 32, 40],
   [25, 33, 41],
   [26, 34, 42],
   [27, 35, 43],
   [28, 36, 44],
   [29, 37, 45],
   [30, 38, 46],
   [31, 39, 47]])

这就是我们要寻找的最终形状，可以将其提交给batchnorm_forward函数。

现在，让我们考虑一下当我们像您第一次尝试时那样简单地进行重塑时会发生什么：

a.reshape(-1, 3)

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14],
       [15, 16, 17],
       [18, 19, 20],
       [21, 22, 23],
       [24, 25, 26],
       [27, 28, 29],
       [30, 31, 32],
       [33, 34, 35],
       [36, 37, 38],
       [39, 40, 41],
       [42, 43, 44],
       [45, 46, 47]])

如您所见，此重塑操作本质上是从开始到结束顺序选取元素并将它们排列到目标数组中。

有了上面的理解，我们就可以推断出我们希望数组是什么样子，然后通过重塑来达到我们想要的结果：

[[[[ 0  8 16]
   [ 1  9 17]
   [ 2 10 18]
   [ 3 11 19]]

  [[ 4 12 20]
   [ 5 13 21]
   [ 6 14 22]
   [ 7 15 23]]]


 [[[24 32 40]
   [25 33 41]
   [26 34 42]
   [27 35 43]]

  [[28 36 44]
   [29 37 45]
   [30 38 46]
   [31 39 47]]]]

我们可以看到最终的维度是通道数（3），倒数第二个维度对应图像的宽度，第一个维度是批量大小。为了实现这一点，我们可以转置数组：

b = a.transpose(0,2,3,1)

在本例中，轴 3 最初表示宽度，轴 1 表示通道数，轴 0 表示批量大小。

为什么要在空间批量归一化中转置

问题描述投票：0回答：1

1个回答

最新问题

为什么要在空间批量归一化中转置

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1