根据多层 RNN 的定义,第一层每个时间步的输出用作第二层同一时间步的输入,依此类推 请参阅此处。 然而,在 PyTorch 官方页面RNN中所述的 RNN 实现中,他们在给定时间步长的所有层中继续使用原始输入
x[t]
。下面给出的代码显示了在层循环中的情况。
def forward(x, h_0=None):
if batch_first:
x = x.transpose(0, 1)
seq_len, batch_size, _ = x.size()
if h_0 is None:
h_0 = torch.zeros(num_layers, batch_size, hidden_size)
h_t_minus_1 = h_0
h_t = h_0
output = []
for t in range(seq_len):
for layer in range(num_layers):
h_t[layer] = torch.tanh(
x[t] @ weight_ih[layer].T
+ bias_ih[layer]
+ h_t_minus_1[layer] @ weight_hh[layer].T
+ bias_hh[layer]
)
output.append(h_t[-1])
h_t_minus_1 = h_t
output = torch.stack(output)
if batch_first:
output = output.transpose(0, 1)
return output, h_t
我写的不就是这样吗?第一个隐藏层的输入应该是原始输入
x[t]
,然后第二个隐藏层的输入应该是第一个隐藏层的输出或h[layer-1]
(使用仿射变换使其大小与权重和偏差兼容) 。请参考下面的代码。
def forward(self, x, h_0=None):
batch_size, seq_len, _ = x.size()
x = x.transpose(0, 1) #make sequence first for ease of computation
if h_0 is None:
h_0 = torch.zeros(num_layers, batch_size, hidden_dim)
h_t_minus_1 = h_0
h_t = h_0
output_list = []
for t in range(seq_len):
#sequential latent space
for layer in range(num_layers):
if layer == 0:
current_input = x[t] if layer == 0
else
current_input = F.linear(h_t[layer-1], self.w_hh2.T, bias = None)
h_t[layer] = torch.tanh(
current_input @ self.w_ih[layer].T +
h_t_minus_1[layer] @ self.w_hh[layer].T +
self.b_hh[layer])
output = F.linear(h_t[-1], self.w_oh, self.b_oh)
output_list.append(output)
h_t_minus_1 = h_t
output_list = torch.stack(output_list).transpose(0, 1)
return output_list
但不幸的是,我的实现在反向传播时给出了就地修改错误。
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [32, 128]], which is output 0 of AsStridedBackward0, is at version 68; expected version 67 instead.
在包含
的行h_t[layer] = torch.tanh(
current_input @ self.w_ih[layer].T +
h_t_minus_1[layer] @ self.w_hh[layer].T +
self.b_hh[layer])
代码片段不正确。对于第一层之后的所有层,第 n 层的输入是第 n-1 层的输出。
这在
文档的
num_layers
部分进行了澄清:
num_layers – Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two RNNs together to form a stacked RNN, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1
这也与所使用的权重的形状一致:
weight_ih_l[k] – the learnable input-hidden weights of the k-th layer, of shape (hidden_size, input_size) for k = 0. Otherwise, the shape is (hidden_size, num_directions * hidden_size)
第一层之后的层具有
weight_ih_l[k]
的 (hidden_size, num_directions * hidden_size)
形状,这与使用原始输入不兼容。
我认为这只是对文档的疏忽。在底层,pytorch 使用 CUDA 例程进行 RNN - 文档中的代码并不代表实际实现