这段代码是解释 PyTorch 中累积梯度的准确表示吗？

Question

import torch
import torch.nn as nn

class SimpleModel(nn.Module):
    def __init__(self):
        super(SimpleModel, self).__init__()
        self.fc = nn.Linear(10, 1)

    def forward(self, x):
        return self.fc(x)

model = SimpleModel()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

# Turn on gradient accumulation
for param in model.parameters():
    param.requires_grad = True
    param.grad = None

# Set AccumulateGrad to True
optimizer.zero_grad()
accumulated_gradients = None

for i in range(5):  # Simulating 5 batches
    input_data = torch.randn(1, 10)
    output = model(input_data)
    loss = output.sum()
    loss.backward(retain_graph=True)  # Accumulate gradients

    if accumulated_gradients is None:
        accumulated_gradients = [param.grad for param in model.parameters()]
    else:
        for j, param in enumerate(model.parameters()):
            accumulated_gradients[j] += param.grad
    
    print(f"Gradients after batch {i + 1} accumulated.")

# Update the parameters using the accumulated gradients
for param, accumulated_gradient in zip(model.parameters(), accumulated_gradients):
    param.grad = accumulated_gradient
optimizer.step()

# Print the updated weights
print(f"Weights after 5 batches update: {model.fc.weight.data}")

在这段代码中，我尝试设置一个示例，说明累积梯度如何在 PyTorch 中工作。

此代码首先收集梯度或对梯度值求和。第五次迭代后，它更新权重。这段代码对吗？？

Answer 1

如果您希望在 PyTorch 中累积梯度，您只需在代码中不要使用

zero_grad()

即可。让我给你举个例子吧

import torch
from torch import nn

class SimpleModel(nn.Module):
    def __init__(self):
        super(SimpleModel, self).__init__()
        self.fc = nn.Linear(5, 1)

    def forward(self, x):
        return self.fc(x)

model = SimpleModel()
print(model.fc.weight.grad)
>>> None

在这里，我创建了一个简单的模型。在开始之前，我检查了层fc的权重梯度，正如你所看到的，它现在没有梯度（它的梯度是None）。现在，让我们积累一些梯度！

input_data = torch.randn(1, 5)
for i in range(5):
    output = model(input_data)
    loss = output.sum()
    loss.backward()
    
    print(f'grad after {i+1} iteration:', model.fc.weight.grad)
    model.zero_grad()
>>> grad after 1 iteration: tensor([[-0.3563,  0.3777,  1.7307,  1.8210,  0.1536]])
>>> grad after 2 iteration: tensor([[-0.7126,  0.7553,  3.4615,  3.6419,  0.3071]])
>>> grad after 3 iteration: tensor([[-1.0690,  1.1330,  5.1922,  5.4629,  0.4607]])
>>> grad after 4 iteration: tensor([[-1.4253,  1.5107,  6.9229,  7.2839,  0.6143]])
>>> grad after 5 iteration: tensor([[-1.7816,  1.8883,  8.6536,  9.1048,  0.7679]])

我用

input_data

输入网络，计算了一些损失，最后通过编写

loss.backward()

开始了后向过程。然后我打印权重的梯度。你看，梯度正在相加（累积）。由于我的输入和损失在每次迭代中都没有改变，因此权重接收相同的梯度并将其添加到迄今为止累积的梯度中。因此，在这个例子中，5次迭代后累积的梯度是1次迭代后梯度的5倍。最后，要使用累积的梯度更新权重，您可以添加以下代码，

optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
optimizer.step()
optimizer.zero_grad() # Maybe zero out the gradients if you don't want them anymore

这段代码是解释 PyTorch 中累积梯度的准确表示吗？

问题描述投票：0回答：1

1个回答

最新问题

这段代码是解释 PyTorch 中累积梯度的准确表示吗？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1