我想创建一个递归模型来解决我所知道的最简单的序列,即算术级数。以
a
为基数,以 d
为步长,序列如下:
a, a+d, a+2d, a+3d, a+4d, ...
为了解决这个问题,将隐藏状态表示为
h
,模型必须学习一个简单的 2*2 矩阵。这实际上是设置h1 = t0
.
换句话说,你也可以这样看:
所以这个具有 2*2 全连接层的模型应该能够学习这个矩阵:
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.fc1 = nn.Linear(2, 2, bias=False)
def forward(self, x):
x = self.fc1(x)
return x
但令我惊讶的是并没有收敛!我的设置应该有问题。如果你帮我找到它,我将不胜感激。我怀疑问题应该出在我的训练循环中。
附注我现在特意将批量大小设置为 1。我想稍后填充输入数据。无论如何,模型应该无需批量学习。
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset
import numpy as np
class CustomDataset(Dataset):
def __init__(self, size):
self.size = size
def __len__(self):
return self.size
def __getitem__(self, index):
a0 = (np.random.rand() - 0.5) * 200
d = (np.random.rand() - 0.5) * 40
length = np.random.randint(2, MAX_Length_sequence + 1)
sequence = np.arange(length) * d + a0
next_number = sequence[-1] + d
return length, torch.tensor(sequence, dtype=torch.float32), torch.tensor(next_number, dtype=torch.float32)
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.fc1 = nn.Linear(2, 2, bias=False)
def forward(self, x):
x = self.fc1(x)
return x
# Hyperparameters
EPOCHS = 10
BATCH_SIZE = 1
LEARNING_RATE = 0.001
DATASET_SIZE = 10000
criterion = nn.MSELoss()
# Model
model = Model()
optimizer = optim.Adam(model.parameters(), lr=LEARNING_RATE)
我的训练循环:
for epoch in range(EPOCHS):
dataset = CustomDataset(DATASET_SIZE)
dataloader = DataLoader(dataset, batch_size=BATCH_SIZE)
model.train()
total_loss = 0
for length, sequence, next_number in dataloader:
optimizer.zero_grad()
loss = 0
h = torch.zeros(BATCH_SIZE)
for i in range(length):
x = torch.cat([h, sequence[0, i].unsqueeze(0)])
y = sequence[0, i + 1] if i != length - 1 else next_number[0]
output = model(x)
h, y_hat = output[0].unsqueeze(0), output[1]
loss += criterion(y_hat, y)
loss.backward()
optimizer.step()
total_loss += loss.item()
print(f'Epoch {epoch+1}, Loss: {total_loss/len(dataloader)}')
我只是通过仅从最后一个输出中获取损失来解决这个问题,而不是获取所有损失并将它们相加。它解决了我的问题,但我仍然不明白为什么我的第一种方法不起作用!
for epoch in range(EPOCHS):
dataset = CustomDataset(10000)
dataloader = DataLoader(dataset, batch_size=BATCH_SIZE)
model.train()
total_loss = 0
for length, sequence, next_number in dataloader:
optimizer.zero_grad()
h = torch.zeros(BATCH_SIZE)
for i in range(length):
x = torch.cat([h, sequence[0, i].unsqueeze(0)])
h = model(x)[0].unsqueeze(0)
if i == length - 1: loss = criterion(model(x)[1], next_number[0])
loss.backward()
optimizer.step()
total_loss += loss.item()
print(f'Epoch {epoch+1}, Loss: {total_loss/len(dataloader)}')