我的任务是创建一个以进化策略算法作为优化器的人工神经网络(无推导)。我使用的数据集是 MNIST。目前,我只是尝试使用线性人工神经网络来实现这一点。
我发现一个 Colab 笔记本可以做同样的事情,但是是在 sklearn“make_moons”数据集上。我尝试合并笔记本上的内容,代码运行没有问题;但它输出相同的精度。通常前几个输出是不同的,然后它在训练集中“收敛”为 0.0987,在测试集中“收敛”为 0.098。此外,训练需要很长时间。也许存在冗余迭代?
Colab Notebook,如果你想查看一下: https://colab.research.google.com/drive/1SY38Evy4U9HfUDkofPZ2pLQzEnwvYC81?usp=sharing
我尝试了一些 StackOverflow 的建议,例如调整超参数(学习率、隐藏单元),以及在“dying ReLu”的情况下使用 Leaky ReLu;他们都不起作用。这让我相信问题出在 ES 优化器中。
我是Pytorch的新手,所以如果有任何明显的不当行为,请指出!
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
import torchvision.datasets as datasets
import torchvision.transforms as transforms
import numpy as np
from tqdm import tqdm
# Set decive to CUDA
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# NN & DataLoader hyperparameters
input_size = 784
num_classes = 10
learning_rate = 0.01
batch_size = 64
num_epochs = 1
# Load data
train_dataset = datasets.MNIST(root='dataset/', train=True, transform=transforms.ToTensor(), download=False)
train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
test_dataset = datasets.MNIST(root='dataset/', train=False, transform=transforms.ToTensor(), download=False)
test_loader = DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=True)
# Connected NN
model = nn.Sequential(
nn.Linear(input_size, 40),
nn.ReLU(0.1),
nn.Linear(40, 20),
nn.ReLU(0.1),
nn.Linear(20, num_classes),
nn.ReLU(0.1),
)
model = model.float().to(device)
# Custom loss function
loss_func = nn.CrossEntropyLoss()
def loss(y_pred, y_true):
return 1/loss_func(y_pred, y_true) # We are maximizing the loss in ES, so take the reciprocal
# Now, increasing loss means the model is learning
# Fitness function
def fitness_func(solution, scores, targets):
# Solution is a vector of parameters like mother_parameters
nn.utils.vector_to_parameters(solution, model.parameters())
return loss(scores, targets)
# In ES, our population is a slightly altered version of the mother parameters, so we implement a jitter function
def jitter(mother_params, state_dict):
params_try = mother_params + SIGMA*state_dict.to(device)
return params_try
# Now, we calculate the fitness of entire population
def calculate_population_fitness(pop, mother_vector, scores, targets):
fitness = torch.zeros(pop.shape[0])
for i, params in enumerate(pop):
p_try = jitter(mother_vector, params)
fitness[i] = fitness_func(p_try, scores, targets)
return fitness
# Calculating number of parameters
n_params = nn.utils.parameters_to_vector(model.parameters()).shape[0]
# now, implementing the training algorithm
mother_parameters = model.parameters()
mother_vector = nn.utils.parameters_to_vector(mother_parameters)
# ES hyperparameters
SIGMA = 0.01
LR = 0.01
POPULATION_SIZE=50
ITERATIONS = 500
# Train network
for epoch in range(num_epochs):
for batch_idx, (data, targets) in enumerate(train_loader):
data = data.to(device=device)
targets = targets.to(device=device)
# Correcting shape
data = data.reshape(data.shape[0], -1)
scores = model(data)
print(f"{batch_idx} out of {len(train_loader)}")
# ES optimizer
with torch.no_grad(): # No need for gradients
for iteration in tqdm(range(ITERATIONS)):
pop = torch.from_numpy(np.random.randn(POPULATION_SIZE, n_params)).float().to(device)
fitness = calculate_population_fitness(pop, mother_vector, scores, targets)
# Normalize the fitness
normalized_fitness = ((fitness - torch.mean(fitness)) / torch.std(fitness)).to(device)
# Update mother vector with the fitness values
mother_vector = mother_vector.to(device) + (LR / (POPULATION_SIZE * SIGMA)) * torch.matmul(pop.t(), normalized_fitness)
# Update the model parameters
nn.utils.vector_to_parameters(mother_vector, model.parameters())
# Computing accuracy
num_correct = 0
num_samples = 0
for x, y in train_loader:
x = x.to(device=device)
y = y.to(device=device)
x = x.reshape(x.shape[0], -1)
scores = model(x)
_, predictions = scores.max(1)
num_correct += (predictions == y).sum()
num_samples += predictions.size(0)
print(num_correct, num_samples)
print(f"accuracy {float(num_correct)/float(num_samples)*100:.2f}")
print("------------------------------------------")
最明显的问题是,在开始循环总体之前,您只评估模型一次(在
scores = model(data)
行中)。
您需要针对“母”向量的每次扰动更新和评估模型。