我正在尝试使用表格数据集构建 PyTorch 分类程序,我的模型具有以下架构:
BATCH_SIZE = 8
EPOCHS = 10
HIDDEN_NEURONS = 25
LR = 1e-3
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.input_layer = nn.Linear(X.shape[1], HIDDEN_NEURONS)
self.linear = nn.Linear(HIDDEN_NEURONS, 1)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
x = self.input_layer(x)
x = self.linear(x)
x = self.sigmoid(x)
return x
模型非常简单而且很小。该模型具有以下训练循环:
total_loss_train_plot = []
total_loss_validation_plot = []
total_acc_train_plot = []
total_acc_validation_plot = []
for epoch in range(EPOCHS):
total_acc_train = 0
total_loss_train = 0
total_acc_val = 0
total_loss_val = 0
## Training and Validation
for indx, data in enumerate(train_dataloader):
input, label = data
input.to(device)
label.to(device)
prediction = model(input).squeeze(1)
#print(prediction)
batch_loss = criterion(prediction, label)
total_loss_train += batch_loss.item()
acc = ((prediction).round() == label).sum().item()
total_acc_train += acc
batch_loss.backward()
optimizer.step()
optimizer.zero_grad()
## Validation
with torch.no_grad():
for indx, data in enumerate(validation_dataloader):
input, label = data
input.to(device)
label.to(device)
prediction = model(input).squeeze(1)
batch_loss = criterion(prediction, label)
total_loss_train += batch_loss.item()
acc = ((prediction).round() == label).sum().item()
total_acc_val += acc
total_loss_train_plot.append(round(total_loss_train/1000, 4))
total_loss_validation_plot.append(round(total_loss_val/1000, 4))
total_acc_train_plot.append(round(total_acc_train/(training_data.__len__())*100, 4))
total_acc_validation_plot.append(round(total_acc_val/(validation_data.__len__())*100, 4))
print(f'''Epoch no. {epoch + 1} Train Loss: {total_loss_train/1000:.4f} Train Accuracy: {(total_acc_train/(training_data.__len__())*100):.4f} Validation Loss: {total_loss_val/1000:.4f} Validation Accuracy: {(total_acc_val/(validation_data.__len__())*100):.4f}''')
print("="*50)
损失和准确率没有改善并且保持不变:
Epoch no. 1 Train Loss: 105.9250 Train Accuracy: 44.9603 Validation Loss: 0.0000 Validation Accuracy: 46.4443
==================================================
Epoch no. 2 Train Loss: 105.9250 Train Accuracy: 44.9603 Validation Loss: 0.0000 Validation Accuracy: 46.4443
==================================================
Epoch no. 3 Train Loss: 105.9250 Train Accuracy: 44.9603 Validation Loss: 0.0000 Validation Accuracy: 46.4443
==================================================
Epoch no. 4 Train Loss: 105.8375 Train Accuracy: 44.9603 Validation Loss: 0.0000 Validation Accuracy: 46.4443
==================================================
Epoch no. 5 Train Loss: 105.8375 Train Accuracy: 44.9603 Validation Loss: 0.0000 Validation Accuracy: 46.4443
==================================================
Epoch no. 6 Train Loss: 105.9250 Train Accuracy: 44.9603 Validation Loss: 0.0000 Validation Accuracy: 46.4443
==================================================
Epoch no. 7 Train Loss: 105.9250 Train Accuracy: 44.9603 Validation Loss: 0.0000 Validation Accuracy: 46.4443
==================================================
Epoch no. 8 Train Loss: 105.9250 Train Accuracy: 44.9603 Validation Loss: 0.0000 Validation Accuracy: 46.4443
==================================================
Epoch no. 9 Train Loss: 105.9250 Train Accuracy: 44.9603 Validation Loss: 0.0000 Validation Accuracy: 46.4443
==================================================
Epoch no. 10 Train Loss: 105.8375 Train Accuracy: 44.9603 Validation Loss: 0.0000 Validation Accuracy: 46.4443
==================================================
但是当我将损失更改为
BCEWithLogitsLoss
并删除 sigmoid 层时,训练得到了改善,并且通过减少损失和提高准确性而工作得很好。当我将损失更改为 logits 损失时,我得到以下结果:
Epoch no. 1 Train Loss: 0.7597 Train Accuracy: 96.7476 Validation Loss: 0.0000 Validation Accuracy: 98.8270
==================================================
Epoch no. 2 Train Loss: 0.9141 Train Accuracy: 96.2841 Validation Loss: 0.0000 Validation Accuracy: 98.6070
==================================================
Epoch no. 3 Train Loss: 0.6364 Train Accuracy: 97.2189 Validation Loss: 0.0000 Validation Accuracy: 98.1305
==================================================
Epoch no. 4 Train Loss: 0.7539 Train Accuracy: 96.5748 Validation Loss: 0.0000 Validation Accuracy: 98.8270
==================================================
Epoch no. 5 Train Loss: 0.8025 Train Accuracy: 96.6062 Validation Loss: 0.0000 Validation Accuracy: 96.8109
==================================================
Epoch no. 6 Train Loss: 0.6069 Train Accuracy: 96.8340 Validation Loss: 0.0000 Validation Accuracy: 98.9370
==================================================
Epoch no. 7 Train Loss: 0.6626 Train Accuracy: 96.8261 Validation Loss: 0.0000 Validation Accuracy: 96.2977
==================================================
Epoch no. 8 Train Loss: 0.5833 Train Accuracy: 96.6140 Validation Loss: 0.0000 Validation Accuracy: 98.6804
==================================================
Epoch no. 9 Train Loss: 0.4303 Train Accuracy: 97.3604 Validation Loss: 0.0000 Validation Accuracy: 98.2405
==================================================
Epoch no. 10 Train Loss: 0.5376 Train Accuracy: 97.0225 Validation Loss: 0.0000 Validation Accuracy: 96.9208
==================================================
我知道这两个功能之间的区别。其中一种仅接受 sigmoid 之后的概率 (
BCELoss
),另一种则接受 sigmoid 之前的 logits。但为什么网络在改变这两个功能时会表现得这样呢?我曾经用 BCELoss
进行 Bert 进行二进制文本分类,并且工作得非常好。
对此有什么解释吗?
我发现了问题,我必须规范我的数据。我使用以下代码对数据进行标准化,并且工作得很好:
for column in data_df.columns:
data_df[column] = data_df[column]/data_df[column].abs().max()
data_df.head()