这段代码是解释 PyTorch 中累积梯度的准确表示吗?

问题描述 投票:0回答:1
import torch
import torch.nn as nn

class SimpleModel(nn.Module):
    def __init__(self):
        super(SimpleModel, self).__init__()
        self.fc = nn.Linear(10, 1)

    def forward(self, x):
        return self.fc(x)

model = SimpleModel()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

# Turn on gradient accumulation
for param in model.parameters():
    param.requires_grad = True
    param.grad = None

# Set AccumulateGrad to True
optimizer.zero_grad()
accumulated_gradients = None

for i in range(5):  # Simulating 5 batches
    input_data = torch.randn(1, 10)
    output = model(input_data)
    loss = output.sum()
    loss.backward(retain_graph=True)  # Accumulate gradients

    if accumulated_gradients is None:
        accumulated_gradients = [param.grad for param in model.parameters()]
    else:
        for j, param in enumerate(model.parameters()):
            accumulated_gradients[j] += param.grad
    
    print(f"Gradients after batch {i + 1} accumulated.")

# Update the parameters using the accumulated gradients
for param, accumulated_gradient in zip(model.parameters(), accumulated_gradients):
    param.grad = accumulated_gradient
optimizer.step()

# Print the updated weights
print(f"Weights after 5 batches update: {model.fc.weight.data}")

在这段代码中,我尝试设置一个示例,说明累积梯度如何在 PyTorch 中工作。

此代码首先收集梯度或对梯度值求和。第五次迭代后,它更新权重。 这段代码对吗??

deep-learning pytorch
1个回答
0
投票

如果您希望在 PyTorch 中累积梯度,您只需在代码中不要使用

zero_grad()
即可。让我给你举个例子吧

import torch
from torch import nn

class SimpleModel(nn.Module):
    def __init__(self):
        super(SimpleModel, self).__init__()
        self.fc = nn.Linear(5, 1)

    def forward(self, x):
        return self.fc(x)

model = SimpleModel()
print(model.fc.weight.grad)
>>> None

在这里,我创建了一个简单的模型。在开始之前,我检查了层fc的权重梯度,正如你所看到的,它现在没有梯度(它的梯度是None)。现在,让我们积累一些梯度!

input_data = torch.randn(1, 5)
for i in range(5):
    output = model(input_data)
    loss = output.sum()
    loss.backward()
    
    print(f'grad after {i+1} iteration:', model.fc.weight.grad)
    model.zero_grad()
>>> grad after 1 iteration: tensor([[-0.3563,  0.3777,  1.7307,  1.8210,  0.1536]])
>>> grad after 2 iteration: tensor([[-0.7126,  0.7553,  3.4615,  3.6419,  0.3071]])
>>> grad after 3 iteration: tensor([[-1.0690,  1.1330,  5.1922,  5.4629,  0.4607]])
>>> grad after 4 iteration: tensor([[-1.4253,  1.5107,  6.9229,  7.2839,  0.6143]])
>>> grad after 5 iteration: tensor([[-1.7816,  1.8883,  8.6536,  9.1048,  0.7679]])

我用

input_data
输入网络,计算了一些损失,最后通过编写
loss.backward()
开始了后向过程。然后我打印权重的梯度。你看,梯度正在相加(累积)。由于我的输入和损失在每次迭代中都没有改变,因此权重接收相同的梯度并将其添加到迄今为止累积的梯度中。因此,在这个例子中,5次迭代后累积的梯度是1次迭代后梯度的5倍。最后,要使用累积的梯度更新权重,您可以添加以下代码,

optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
optimizer.step()
optimizer.zero_grad() # Maybe zero out the gradients if you don't want them anymore
© www.soinside.com 2019 - 2024. All rights reserved.