我尝试在二维热方程上训练多模态模型。
语境: 我拥有的最好的是具有 5*5 内核的 CNN,该内核经过优化以输出具有给定扩散系数的温度图。 现在,我尝试为模型提供其他系数并将其提供给一个简单的前馈网络,以便为任何扩散系数找到一种方法来访问将提供正确输出温度图的适当内核。
问题: 2个线性层没有被优化,它们的参数值在训练过程中是不变的。
火炬代码:
def train(num_epochs = epochs, learning_rate = lr, verbose=True):
class FrozenConv2d(nn.Conv2d):
def __init__(self):
super().__init__(in_channels=1, out_channels=1, kernel_size=(5,5),padding=2, padding_mode='replicate', bias=False)
self.weight.requires_grad = False # freeze the convolution kernel
#self.bias.requires_grad = False
def forward(self, x):
out = nn.functional.conv2d(x, self.weight, bias=None, padding=2)#, self.bias)
return out
class Smart(nn.Module):
def __init__(self, output_size=10000, xdim=100):
super(Smart, self).__init__()
self.xdim = xdim
self.ydim = int(output_size/xdim)
self.l1 = nn.Linear(1,5)
self.l2 = nn.Linear(5,25)
#following comes from trainedCNN_0.003_that performs well at alpha = 0.0018, weight + bias of the model :
init_kernel = torch.Tensor([0.1826707, -0.08769821, 0.10698076, 0.16084763, -0.00565027,
0.14241125, -0.12225045, -0.05741217, 0.0820392, 0.19563171,
-0.05193448, 0.17927778, 0.01597985, 0.00086921, 0.07576136,
-0.0163596, 0.15580693, 0.13373081, -0.05192659, 0.02516613,
0.06127854, 0.18883559, -0.06456435, 0.1633646, 0.07276782])
init_kernel = init_kernel/0.0018
#we need an other format of matrix (25,5) such as * (1,1,1,1,1).T = init_kernel
kernel = torch.Tensor(
[[elt/5]*5 for elt in init_kernel]
)
self.l2.weight = nn.Parameter(kernel)
def forward(self, x):
alpha = x[1]
alpha = alpha.view(1) #reformat the alpha coefficients of the batch
pre_kernel = nn.ReLU()(self.l1(alpha))
kernel = self.l2(pre_kernel).view(1,1,5,5)
conv_layer = FrozenConv2d()
conv_layer.weight = nn.Parameter(kernel)
#conv_layer.bias = nn.Parameter(torch.zeros(1))
image = x[0].view(1, self.ydim, self.xdim) # reformat the images of the batch
out = nn.ReLU()(conv_layer(image))
out = out.view(self.ydim,self.xdim)
return out
model = Smart().to(device)
#loss and optimizer, scheduler, writer
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
if schedule:
scheduler = lr_scheduler.MultiStepLR(optimizer=optimizer, milestones=sched_step, gamma=0.5)
if tensorboard:
writer = SummaryWriter(tensorboard_path + '/' + tensorboard_name)
step = 0
print("\n")
#training loop
model.train()
for epoch in range(num_epochs):
for inputs, true_outputs in train_set:
inputs = [inp.to(device) for inp in inputs]
true_outputs = true_outputs.to(device)
#forward
pred_outputs = model(inputs)
loss = criterion(pred_outputs, true_outputs)
#backwards
optimizer.zero_grad()
loss.backward()
optimizer.step()
if tensorboard:
writer.add_scalar('Training loss', loss, global_step=step)
step +=1
if schedule:
scheduler.step()
if (epoch+1) % epoch_step_to_print == 0 and verbose:
print(f'epoch {epoch+1} / {num_epochs}, loss = {loss.item():.6f}')
return model
model = train()
(我设置了一个线性权重,因为我知道这些数字对某个系数很有效。)
我用
layers=[x.data for x in model.parameters()
检查我的参数值
这些值不会改变,以及训练期间的损失(在任何小数点都完全相同)
我还检查了参数是否仍然有 requires_grad = True 并且他们有,这里没有错。
但是模型并没有优化自己......
PS:我没有使用批处理,因为 Conv2d 层并不意味着批次的每个样本都有不同的内核。所以现在尝试我的架构的唯一方法是使用数据集而不是数据加载器