具有进化策略优化器的神经网络在 MNIST - Pytorch 上保持输出相同的精度

Question

我的任务是创建一个以进化策略算法作为优化器的人工神经网络（无推导）。我使用的数据集是 MNIST。目前，我只是尝试使用线性人工神经网络来实现这一点。

我发现一个 Colab 笔记本可以做同样的事情，但是是在 sklearn“make_moons”数据集上。我尝试合并笔记本上的内容，代码运行没有问题；但它输出相同的精度。通常前几个输出是不同的，然后它在训练集中“收敛”为 0.0987，在测试集中“收敛”为 0.098。此外，训练需要很长时间。也许存在冗余迭代？

Colab Notebook，如果你想查看一下： https://colab.research.google.com/drive/1SY38Evy4U9HfUDkofPZ2pLQzEnwvYC81?usp=sharing

我尝试了一些 StackOverflow 的建议，例如调整超参数（学习率、隐藏单元），以及在“dying ReLu”的情况下使用 Leaky ReLu；他们都不起作用。这让我相信问题出在 ES 优化器中。

我是Pytorch的新手，所以如果有任何明显的不当行为，请指出！

import torch
import torch.nn as nn
from torch.utils.data import DataLoader
import torchvision.datasets as datasets
import torchvision.transforms as transforms
import numpy as np
from tqdm import tqdm
    
# Set decive to CUDA
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# NN & DataLoader hyperparameters
input_size = 784
num_classes = 10
learning_rate = 0.01
batch_size = 64
num_epochs = 1 

# Load data
train_dataset = datasets.MNIST(root='dataset/', train=True, transform=transforms.ToTensor(), download=False)
train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True) 

test_dataset = datasets.MNIST(root='dataset/', train=False, transform=transforms.ToTensor(), download=False) 
test_loader = DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=True) 

# Connected NN
model = nn.Sequential(
      nn.Linear(input_size, 40),
      nn.ReLU(0.1),
      nn.Linear(40, 20),
      nn.ReLU(0.1),
      nn.Linear(20, num_classes),
      nn.ReLU(0.1),
)
model = model.float().to(device)

# Custom loss function
loss_func = nn.CrossEntropyLoss()

def loss(y_pred, y_true):
  return 1/loss_func(y_pred, y_true) # We are maximizing the loss in ES, so take the reciprocal
  # Now, increasing loss means the model is learning

# Fitness function
def fitness_func(solution, scores, targets):
  # Solution is a vector of parameters like mother_parameters
  nn.utils.vector_to_parameters(solution, model.parameters())
  return loss(scores, targets)

# In ES, our population is a slightly altered version of the mother parameters, so we implement a jitter function
def jitter(mother_params, state_dict):
  params_try = mother_params + SIGMA*state_dict.to(device)
  return params_try

# Now, we calculate the fitness of entire population
def calculate_population_fitness(pop, mother_vector, scores, targets):
  fitness = torch.zeros(pop.shape[0])
  for i, params in enumerate(pop):
    p_try = jitter(mother_vector, params)
    fitness[i] = fitness_func(p_try, scores, targets)
  return fitness

# Calculating number of parameters
n_params = nn.utils.parameters_to_vector(model.parameters()).shape[0]

# now, implementing the training algorithm
mother_parameters = model.parameters()
mother_vector = nn.utils.parameters_to_vector(mother_parameters)

# ES hyperparameters
SIGMA = 0.01
LR = 0.01
POPULATION_SIZE=50
ITERATIONS = 500 

# Train network
for epoch in range(num_epochs):
    for batch_idx, (data, targets) in enumerate(train_loader):

        data = data.to(device=device)
        targets = targets.to(device=device)

        # Correcting shape
        data = data.reshape(data.shape[0], -1)

        scores = model(data)

        print(f"{batch_idx} out of {len(train_loader)}")
        
        # ES optimizer
        with torch.no_grad(): # No need for gradients
            for iteration in tqdm(range(ITERATIONS)):
                pop = torch.from_numpy(np.random.randn(POPULATION_SIZE, n_params)).float().to(device)
                fitness = calculate_population_fitness(pop, mother_vector, scores, targets)
                # Normalize the fitness
                normalized_fitness = ((fitness - torch.mean(fitness)) / torch.std(fitness)).to(device)
                # Update mother vector with the fitness values
                mother_vector = mother_vector.to(device) + (LR / (POPULATION_SIZE * SIGMA)) * torch.matmul(pop.t(), normalized_fitness)

        # Update the model parameters
        nn.utils.vector_to_parameters(mother_vector, model.parameters())

        # Computing accuracy
        num_correct = 0
        num_samples = 0

        for x, y in train_loader:
              x = x.to(device=device)
              y = y.to(device=device)
              x = x.reshape(x.shape[0], -1)

              scores = model(x)
              _, predictions = scores.max(1)
              num_correct += (predictions == y).sum()
              num_samples += predictions.size(0)
        
        print(num_correct, num_samples)
        print(f"accuracy {float(num_correct)/float(num_samples)*100:.2f}")
        print("------------------------------------------")

Answer 1

最明显的问题是，在开始循环总体之前，您只评估模型一次（在

scores = model(data)

行中）。

您需要针对“母”向量的每次扰动更新和评估模型。

具有进化策略优化器的神经网络在 MNIST - Pytorch 上保持输出相同的精度

问题描述投票：0回答：1

1个回答

最新问题

具有进化策略优化器的神经网络在 MNIST - Pytorch 上保持输出相同的精度

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1