为什么我的 RNN 无法收敛到简单任务?

问题描述 投票:0回答:1

我想创建一个递归模型来解决我所知道的最简单的序列,即算术级数。以

a
为基数,以
d
为步长,序列如下:

a, a+d, a+2d, a+3d, a+4d, ...

为了解决这个问题,将隐藏状态表示为

h
,模型必须学习一个简单的 2*2 矩阵。这实际上是设置
h1 = t0
.

enter image description here

换句话说,你也可以这样看:

enter image description here

所以这个具有 2*2 全连接层的模型应该能够学习这个矩阵:

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.fc1 = nn.Linear(2, 2, bias=False)

    def forward(self, x):
        x = self.fc1(x)
        return x

但令我惊讶的是并没有收敛!我的设置应该有问题。如果你帮我找到它,我将不胜感激。我怀疑问题应该出在我的训练循环中。

附注我现在特意将批量大小设置为 1。我想稍后填充输入数据。无论如何,模型应该无需批量学习。

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset
import numpy as np

class CustomDataset(Dataset):
    def __init__(self, size):
        self.size = size

    def __len__(self):
        return self.size

    def __getitem__(self, index):
        a0 = (np.random.rand() - 0.5) * 200
        d = (np.random.rand() - 0.5) * 40
        length = np.random.randint(2, MAX_Length_sequence + 1)

        sequence = np.arange(length) * d + a0
        next_number = sequence[-1] + d

        return length, torch.tensor(sequence, dtype=torch.float32), torch.tensor(next_number, dtype=torch.float32)

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.fc1 = nn.Linear(2, 2, bias=False)

    def forward(self, x):
        x = self.fc1(x)
        return x

# Hyperparameters
EPOCHS = 10
BATCH_SIZE = 1
LEARNING_RATE = 0.001
DATASET_SIZE = 10000
criterion = nn.MSELoss()

# Model
model = Model()
optimizer = optim.Adam(model.parameters(), lr=LEARNING_RATE)

我的训练循环:

for epoch in range(EPOCHS):
    dataset = CustomDataset(DATASET_SIZE)
    dataloader = DataLoader(dataset, batch_size=BATCH_SIZE)
    model.train()
    total_loss = 0

    for length, sequence, next_number in dataloader:
        optimizer.zero_grad()
        loss = 0
        h = torch.zeros(BATCH_SIZE)

        for i in range(length):
            x = torch.cat([h, sequence[0, i].unsqueeze(0)])
            y = sequence[0, i + 1] if i != length - 1 else next_number[0]

            output = model(x)
            h, y_hat = output[0].unsqueeze(0), output[1]

            loss += criterion(y_hat, y)

        loss.backward()
        optimizer.step()
        total_loss += loss.item() 
        
    print(f'Epoch {epoch+1}, Loss: {total_loss/len(dataloader)}')
python machine-learning deep-learning pytorch recurrent-neural-network
1个回答
0
投票

我只是通过仅从最后一个输出中获取损失来解决这个问题,而不是获取所有损失并将它们相加。它解决了我的问题,但我仍然不明白为什么我的第一种方法不起作用!

for epoch in range(EPOCHS):
    dataset = CustomDataset(10000)
    dataloader = DataLoader(dataset, batch_size=BATCH_SIZE)
    model.train()
    total_loss = 0

    for length, sequence, next_number in dataloader:
        optimizer.zero_grad()
        h = torch.zeros(BATCH_SIZE)

        for i in range(length):
            x = torch.cat([h, sequence[0, i].unsqueeze(0)])
            h = model(x)[0].unsqueeze(0)
            
            if i == length - 1: loss = criterion(model(x)[1], next_number[0])
            
        loss.backward()
        optimizer.step()
        total_loss += loss.item() 
        
    print(f'Epoch {epoch+1}, Loss: {total_loss/len(dataloader)}')
© www.soinside.com 2019 - 2024. All rights reserved.