我有以下训练代码。我很确定我只调用了一次
loss.backward()
,但我从标题中得到了错误。我做错了什么?请注意,X_train_tensor
是另一个图形计算的输出,因此它具有required_grad=True
,正如您在打印语句中看到的那样。这是问题的根源吗?如果是,我该如何改变它?它不允许我直接在张量上切换它。
for iter in range(max_iters):
start_ix = 0
loss = None
while start_ix < len(X_train_tensor):
loss = None
end_ix = min(start_ix + batch_size, len(X_train_tensor))
out, loss, accuracy = model(X_train_tensor[start_ix:end_ix], y_train_tensor[start_ix:end_ix])
# every once in a while evaluate the loss on train and val sets
if (start_ix==0) and (iter % 10 == 0 or iter == max_iters - 1):
out_val, loss_val, accuracy_val = model(X_val_tensor, y_val_tensor)
print(f"step {iter}: train loss={loss:.2f} train_acc={accuracy:.3f} | val loss={loss_val:.2f} val_acc={accuracy_val:.3f} {datetime.datetime.now()}")
optimizer.zero_grad(set_to_none=True)
print (iter, start_ix, X_train_tensor.requires_grad, y_train_tensor.requires_grad, loss.requires_grad)
loss.backward()
optimizer.step()
start_ix = end_ix + 1
这是错误:
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
更新:这是模型输入张量的来源,作为其他(自动编码器)模型的输出:
autoencoder.eval()
with torch.no_grad(): # it seems like adding this line solves the problem?
X_train_encoded, loss = autoencoder(X_train_tensor)
X_val_encoded, loss = autoencoder(X_val_tensor)
X_test_encoded, loss = autoencoder(X_test_tensor)
添加上面的
with torch.no_grad()
行解决了问题,但我不明白为什么。它实际上改变了输出的生成方式吗?它是如何工作的?
据我了解,
X_train_tensor
是自动编码器的输出。当您在编码步骤期间不运行 torch.no_grad()
时,将为自动编码器的输出创建一个计算图,它将自动编码器的操作和权重链接到编码的张量。在您的代码中,由于模型的输出使用 X_train_tensor
,因此模型的损失连接到自动编码器的计算图。
当您第一次调用
loss.backward()
时,PyTorch 会遍历整个计算图(包括自动编码器)来计算梯度,然后清除图。当您在循环的第二次迭代中调用 loss.backward()
时,您正在尝试遍历已清除的自动编码器的计算图。
torch.no_grad()
阻止 PyTorch 创建自动编码器计算图或将生成的损失链接到自动编码器。