医学二元分类概率：BCE vs CrossEntropy

Question

我是一名非常喜欢学习机器学习的住院医师，我花了很多时间在这里阅读，并希望在我的领域（医学成像）中使用它。我们有一项名为“Datscan 闪烁扫描术”的检查，这是人们大脑的代谢视图，以查看他们是否患有帕金森病（这是一种简化，但足以理解）。

问题在于，这是一项耗时约 30 分钟的长时间检查，因为“摄像机”围绕患者旋转 120 圈。所以有时，我们的老年患者无法忍受，并且无法帮助他们进行诊断，这令人沮丧。这就是为什么我正在构建一个 CNN，尝试仅使用前 2 个投影（前和后，数字 0 和 60）在“正常 Datscan”或“异常 Datscan”之间进行分类，并作为最终输出“”异常概率datscan”介于 0 和 1 之间。 我的目标是，在获得这个概率后，我可以根据我们的需要更改阈值并使其更敏感或更具体。

我构建了一个数据集，其中包含 887 个数据扫描，这些数据扫描转换为每 120 个 128x128 像素矩阵的 npy 数组，并且只使用其中的 2 个（数字 0 和 60）。它是灰度图像，因此通道为 1。我在 pytorch 中尝试了不同的架构，这是带有 BCEWithLogitsLoss 的 VGG 架构：

 class ReseauConvolutionSigmo(nn.Module):
    def __init__(self):
        super(ReseauConvolutionSigmo, self).__init__()
        self.conv1a = nn.Conv2d(2, 64, 3, stride=1)
        self.conv1b = nn.Conv2d(64, 64, 5, stride=1)
        self.pool1 = nn.MaxPool2d(2,2)

        self.conv2a = nn.Conv2d(64, 128, 3, stride=1)
        self.conv2b = nn.Conv2d(128, 128, 3, stride=1)
        self.pool2 = nn.MaxPool2d(2,2)
        
        self.conv3a = nn.Conv2d(128, 256, 3, stride=1)
        self.conv3b = nn.Conv2d(256, 256, 3, stride=1)
        self.pool3 = nn.MaxPool2d(2,2)
                
        self.fc1 = nn.Linear(36864, 84)  
        self.fc2 = nn.Linear(84, 1)       
        
    def forward(self, x):
        x=x.float()
        
        x=self.conv1a(x)
        x=F.relu(x)
        x=self.conv1b(x)
        x=F.relu(x)
        x=self.pool1(x)
        
        x=self.conv2a(x)
        x=F.relu(x)
        x=self.conv2b(x)
        x=F.relu(x)
        x=self.pool2(x)
        
        x=self.conv3a(x)
        x=F.relu(x)
        x=self.conv3b(x)
        x=F.relu(x)
        x=self.pool3(x)
        
        x = torch.flatten(x, 1)  # Flatten the feature maps
        
        try:
            x = F.relu(self.fc1(x))
        except RuntimeError as e:
            e = str(e)
            if e.endswith("Output size is too small"):
                print("Image size is too small.")
            elif "shapes cannot be multiplied" in e:
                required_shape = e[e.index("x") + 1:].split(" ")[0]
                print(f"Linear layer needs to have size: {required_shape}")
            else:
                print(f"Error other: {e}") 
                
        x = self.fc2(x)

        return x

network = ReseauConvolutionSigmo()
n_epochs = 100
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(network.parameters(), lr=0.001)

train_losses = [ ]
train_counter = [ ]
test_losses = [ ]
test_accuracy = [ ]

network.to(device)
print('******* Evaluation initiale')
test()
for epoch in range(0, n_epochs):
  print('******* Epoch ',epoch)
  train()
  test()

但是这样做时，该批次的 6 个元素的输出张量都很快收敛到相同的值，

Evaluation initiale
test loss= 0.7000894740570424
Output tensor([[0.0826],
[0.0827],
[0.0827],
[0.0825],
[0.0827],
[0.0827]])
Predicted tensor([[0.], [0.], [0.],[0.],[0.], [0.]])
Datscan tensor([[0.],[0.], [0.],[1.],[0.],[1.]])
Accuracy in test 57.36434108527132 %
Epoch  0
train loss= 0.6777993538058721
test loss= 0.6830593472303346
Output tensor([[-0.3489],
[-0.3479],
[-0.3391],
[-0.3410],
[-0.3442],
[-0.3469]])
Predicted tensor([[0.], [0.], [0.],[0.],[0.], [0.]])
Datscan tensor([[1.], [0.],[0.],[0.], [0.],[0.]])
Accuracy in test 57.36434108527132 %
Epoch  1
train loss= 0.7050089922088844
test loss= 0.6875958317934081
Output tensor([[-0.0826],
[-0.0826],
[-0.0826],
[-0.0826],
[-0.0826],
[-0.0826]])
Predicted tensor([[0.], [0.], [0.],[0.],[0.], [0.]])
Datscan tensor([[0.], [0.],[1.],[1.], [0.], [1.]])
Accuracy in test 57.751937984496124 %
Epoch  2
train loss= 0.6914097838676893
test loss= 0.6917881480483121
Output tensor([[-0.0191],
[-0.0191],
[-0.0191],
[-0.0191],
[-0.0191],
[-0.0191]])
Predicted tensor([[0.], [0.], [0.],[0.],[0.], [0.]])
Datscan tensor([[0.],[1.], [1.],[1.],[0.],[1.]])
Accuracy in test 57.36434108527132 %

##Even at further epoch:##

Epoch  40
train loss= 0.6704580792440817
test loss= 0.6978785312452982
Output tensor([[-0.6284],
[-0.6284],
[-0.6284],
[-0.6284],
[-0.6284],
[-0.6284]])
Predicted tensor([[0.], [0.], [0.],[0.],[0.], [0.]])
Datscan tensor([[0.],[0.], [0.],[0.],[1.],[1.]])
correct 147
total 258
Accuracy in test 56.97674418604651 %

如你所见，训练损失并没有减少那么多，准确率停留在 57%。

我有点绝望，尝试了另一个标准：CrossEntropy：在那里它的效果确实更好，例如最后 3 个 epoch 的最终准确率为 79%：

Epoch  47
train loss= 0.0002839015607657339
test loss= 1.7087745488627646
correct 203
total 258
Accuracy in test 78.68217054263566 %
Sortie du réseau :
tensor([[-13.9290,   4.8103],
3.7896,  -9.5477],
[ -3.8057,  -0.1662],
1.8018,  -3.5083],
[ -3.6199,  -2.2624],
6.0148, -12.3137]])
Datscan :    tensor([1, 1, 0, 0, 1, 0])
Prédiction :  tensor([1, 0, 1, 0, 1, 0])

Epoch  48
train loss= 0.0002534455537248284
test loss= 1.7621866015008938
correct 201
total 258
Accuracy in test 77.90697674418605 %
Sortie du réseau :
tensor([[-24.7145,   7.8902],
[-21.3964,   8.6213],
2.1064,  -0.7032],
[ -1.3331,  -0.8390],
[ -5.9108,   4.1722],
[-14.4751,   4.5746]])
Datscan :    tensor([1, 1, 0, 1, 1, 1])
Prédiction :  tensor([1, 1, 0, 1, 1, 1])

Epoch  49
train loss= 0.00022697989463199136
test loss= 1.6694575882692397
correct 204
total 258
Accuracy in test 79.06976744186046 %
Sortie du réseau :
tensor([[ -9.4081,   1.5622],
[ -0.1025,   0.0649],
[-26.1112,   8.3820],
[-12.4035,   3.2135],
[ -6.0753,   0.1667],
9.9138,  -9.7220]])
Datscan :    tensor([0, 0, 1, 1, 1, 0])
Prédiction :  tensor([1, 1, 1, 1, 1, 0])


Sortie du réseau :
tensor([[ 14.8245, -11.1206],
4.8293,  -2.1229],
[-19.1812,   4.6617],
[ -7.9391,   3.3256],
30.0683, -26.7278],
[-11.0685,   3.2678]])

Datscan :    tensor([0, 1, 1, 1, 0, 1])
Prédiction :  tensor([0, 0, 1, 1, 0, 1])

这是我的问题：

是什么让 BCEloss 训练得如此糟糕，即使它是一个二元分类问题？为什么批次中的所有 6 个元素最终都快速趋向相同的输出张量？我尝试改变学习率，但没有明显的改进，也许是优化器的问题？
根据我的理解，BCEWithLogitsLoss中的输出是批次的6个张量，如果输出张量为负，他预测“正常”，如果输出张量为正，他预测“异常”。但它们都处于阴性状态，所以它们都被预测为正常。因为我的目标是得出“异常数据扫描输出的概率”，如果这个模型有更好的精度，我可以在 sigmoid 中使用这个输出张量并创建 0 到 1 的概率，对吗？
CrossEntropyLoss 版本中的输出是 2 x 6 张量，代表处于左类（因此“正常”）或右类（因此“异常”）的“置信度”，较高的张量值是预测的班级。例如：

tensor([[ 14.8245, -11.1206], = predicted normal
        [  4.8293,  -2.1229], = predicted normal
        [-19.1812,   4.6617], = predicted abnormal
        [ -7.9391,   3.3256], = predicted abnormal
        [ 30.0683, -26.7278], = predicted normal
        [-11.0685,   3.2678]]) = predicted abnormal

然而，虽然它具有更好的精度，但问题是如何将这些输出张量表示为“异常概率”？

非常感谢您的帮助，我期待阅读您对此的想法，它总是非常有趣！

Answer 1

我无法回答为什么BCELossWithLogits对你不起作用，但对于第二种情况，你可以对logits进行softmax，然后只看第二列，这将是样本异常的概率。给定

a = torch.tensor([[ 14.8245, -11.1206],
[4.8293,  -2.1229],
[-19.1812,   4.6617],
[ -7.9391,   3.3256],
[30.0683, -26.7278],
[-11.0685,   3.2678]])

应用softmax之后

b = torch.softmax(a, dim = -1)

你最终会得到

tensor([[1.0000e+00, 5.3974e-12],
        [9.9904e-01, 9.5561e-04],
        [4.4173e-11, 1.0000e+00],
        [1.2817e-05, 9.9999e-01],
        [1.0000e+00, 2.1566e-25],
        [5.9405e-07, 1.0000e+00]])

因此异常数据扫描的概率为

[5.3974e-12, 9.5561e-04, 1.0000e+00, 9.9999e-01, 2.1566e-25, 1.0000e+00]

医学二元分类概率：BCE vs CrossEntropy

问题描述投票：0回答：1

1个回答

最新问题

医学二元分类概率：BCE vs CrossEntropy

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1