import torch
import torch.nn as nn
class SimpleModel(nn.Module):
def __init__(self):
super(SimpleModel, self).__init__()
self.fc = nn.Linear(10, 1)
def forward(self, x):
return self.fc(x)
model = SimpleModel()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
# Turn on gradient accumulation
for param in model.parameters():
param.requires_grad = True
param.grad = None
# Set AccumulateGrad to True
optimizer.zero_grad()
accumulated_gradients = None
for i in range(5): # Simulating 5 batches
input_data = torch.randn(1, 10)
output = model(input_data)
loss = output.sum()
loss.backward(retain_graph=True) # Accumulate gradients
if accumulated_gradients is None:
accumulated_gradients = [param.grad for param in model.parameters()]
else:
for j, param in enumerate(model.parameters()):
accumulated_gradients[j] += param.grad
print(f"Gradients after batch {i + 1} accumulated.")
# Update the parameters using the accumulated gradients
for param, accumulated_gradient in zip(model.parameters(), accumulated_gradients):
param.grad = accumulated_gradient
optimizer.step()
# Print the updated weights
print(f"Weights after 5 batches update: {model.fc.weight.data}")
在这段代码中,我尝试设置一个示例,说明累积梯度如何在 PyTorch 中工作。
此代码首先收集梯度或对梯度值求和。第五次迭代后,它更新权重。 这段代码对吗??
如果您希望在 PyTorch 中累积梯度,您只需在代码中不要使用
zero_grad()
即可。让我给你举个例子吧
import torch
from torch import nn
class SimpleModel(nn.Module):
def __init__(self):
super(SimpleModel, self).__init__()
self.fc = nn.Linear(5, 1)
def forward(self, x):
return self.fc(x)
model = SimpleModel()
print(model.fc.weight.grad)
>>> None
在这里,我创建了一个简单的模型。在开始之前,我检查了层fc的权重梯度,正如你所看到的,它现在没有梯度(它的梯度是None)。现在,让我们积累一些梯度!
input_data = torch.randn(1, 5)
for i in range(5):
output = model(input_data)
loss = output.sum()
loss.backward()
print(f'grad after {i+1} iteration:', model.fc.weight.grad)
model.zero_grad()
>>> grad after 1 iteration: tensor([[-0.3563, 0.3777, 1.7307, 1.8210, 0.1536]])
>>> grad after 2 iteration: tensor([[-0.7126, 0.7553, 3.4615, 3.6419, 0.3071]])
>>> grad after 3 iteration: tensor([[-1.0690, 1.1330, 5.1922, 5.4629, 0.4607]])
>>> grad after 4 iteration: tensor([[-1.4253, 1.5107, 6.9229, 7.2839, 0.6143]])
>>> grad after 5 iteration: tensor([[-1.7816, 1.8883, 8.6536, 9.1048, 0.7679]])
我用
input_data
输入网络,计算了一些损失,最后通过编写loss.backward()
开始了后向过程。然后我打印权重的梯度。你看,梯度正在相加(累积)。由于我的输入和损失在每次迭代中都没有改变,因此权重接收相同的梯度并将其添加到迄今为止累积的梯度中。因此,在这个例子中,5次迭代后累积的梯度是1次迭代后梯度的5倍。最后,要使用累积的梯度更新权重,您可以添加以下代码,
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
optimizer.step()
optimizer.zero_grad() # Maybe zero out the gradients if you don't want them anymore