使用nn.输出和目标标签之间的交叉熵

问题描述 投票:0回答:3

我用这个代码

训练模型的功能

def train():
  
  model.train()

  total_loss, total_accuracy = 0, 0
  
  # empty list to save model predictions
  total_preds=[]
  
  # iterate over batches
  for step,batch in enumerate(train_dataloader):
    
    # progress update after every 50 batches.
    if step % 50 == 0 and not step == 0:
      print('  Batch {:>5,}  of  {:>5,}.'.format(step, len(train_dataloader)))

    # push the batch to gpu
    #batch = [r for r in batch]
 
    sent_id, mask, labels = batch['input_ids'],batch['attention_mask'],batch['labels']
    print(6)
    print(sent_id)
    print(mask)
    print(labels)
    print(batch['input_ids'].shape)
    print(batch['attention_mask'].shape)
    print(batch['labels'].shape)

    # clear previously calculated gradients 
    model.zero_grad() 
    print(7)       

    # get model predictions for the current batch
    preds = model(sent_id, mask)
    print(8)
    print(len(preds))
    print(len(labels))
    print(preds.size())
    
   
    preds =torch.argmax(preds, dim=1)
    preds =torch.argmax(preds, dim=1)
    print(preds)
    print(labels)

    # compute the loss between actual and predicted values
    loss = loss_fn(preds, labels)
    print(9)

    # add on to the total loss
    total_loss = total_loss + loss.item()
    print(10)

    # backward pass to calculate the gradients
    loss.backward()

    # clip the the gradients to 1.0. It helps in preventing the exploding gradient problem
  # clip the the gradients to 1.0. It helps in preventing the exploding gradient problem
    torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)

    # update parameters
    optimizer.step()

    # model predictions are stored on GPU. So, push it to CPU
    preds=preds.numpy()

    # append the model predictions
    total_preds.append(preds)

  # compute the training loss of the epoch
  avg_loss = total_loss / len(train_dataloader)
  
  # predictions are in the form of (no. of batches, size of batch, no. of classes).
  # reshape the predictions in form of (number of samples, no. of classes)
  total_preds  = np.concatenate(total_preds, axis=0)

  #returns the loss and predictions
  return avg_loss, total_preds
import torch.nn as nn

loss_fn=nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.00001)

我在计算损失时出错了 普雷兹 张量([5, 1, 1, 1, 0, 2, 1, 4, 2, 3, 0, 2, 0, 1, 0, 3, 5, 3, 1, 2, 0, 2, 2, 1 , 0, 1, 4, 0, 5, 5, 4, 5, 0, 2, 0, 1, 4, 0, 0, 3, 5, 1, 1, 1, 4, 4, 4, 1, 2, 1, 3, 3, 2, 1, 0, 2, 0, 4, 4, 4, 3, 2, 0, 5])

标签 张量([0, 0, 1, 2, 3, 0, 0, 0, 0, 1, 1, 0, 0, 0, 4, 0, 0, 2, 1, 0, 0, 0, 0, 0 , 1, 0, 0, 1, 1, 2, 1, 3, 2, 0, 3, 4, 0, 1, 0, 0, 0, 0, 0, 0, 5, 0, 0, 3, 0, 0, 1, 0, 0, 0, 2, 0, 0, 2, 0, 0, 2, 0, 0, 0]) 我用它们 损失=loss_fn(preds,标签) 错误:

in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
   2844     if size_average is not None or reduce is not None:
   2845         reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2846     return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
   2847 
   2848 

RuntimeError: Expected floating point type for target with class probabilities, got Long
python neural-network torch bert-language-model cross-entropy
3个回答
6
投票

运行时错误:具有类的目标的预期浮点类型 概率,长了

错误非常明显。您需要将目标张量的 dtype 转换为 float。这与您使用的损失函数有关。既然你选择了 CE 损失,你最终会得到概率。而这些概率自然都是浮点数。这意味着您的目标也应该是浮动的。例如,您可能有一个目标张量 a= [1, 0, 0, 1] 您需要将其转换为 [1.0 , 0.0 , 0.0 , 1.0]

您可以使用下表检查所有类型。

╔══════════════════════════╦═══════════════════════════════╦════════════════════╦═════════════════════════╗
║        Data type         ║             dtype             ║     CPU tensor     ║       GPU tensor        ║
╠══════════════════════════╬═══════════════════════════════╬════════════════════╬═════════════════════════╣
║ 32-bit floating point    ║ torch.float32 or torch.float  ║ torch.FloatTensor  ║ torch.cuda.FloatTensor  ║
║ 64-bit floating point    ║ torch.float64 or torch.double ║ torch.DoubleTensor ║ torch.cuda.DoubleTensor ║
║ 16-bit floating point    ║ torch.float16 or torch.half   ║ torch.HalfTensor   ║ torch.cuda.HalfTensor   ║
║ 8-bit integer (unsigned) ║ torch.uint8                   ║ torch.ByteTensor   ║ torch.cuda.ByteTensor   ║
║ 8-bit integer (signed)   ║ torch.int8                    ║ torch.CharTensor   ║ torch.cuda.CharTensor   ║
║ 16-bit integer (signed)  ║ torch.int16 or torch.short    ║ torch.ShortTensor  ║ torch.cuda.ShortTensor  ║
║ 32-bit integer (signed)  ║ torch.int32 or torch.int      ║ torch.IntTensor    ║ torch.cuda.IntTensor    ║
║ 64-bit integer (signed)  ║ torch.int64 or torch.long     ║ torch.LongTensor   ║ torch.cuda.LongTensor   ║
║ Boolean                  ║ torch.bool                    ║ torch.BoolTensor   ║ torch.cuda.BoolTensor   ║
╚══════════════════════════╩═══════════════════════════════╩════════════════════╩═════════════════════════╝

为了将张量转换为另一种数据类型,您可以使用类似的东西

sample_tensor=sample_tensor.type(torch.FloatTensor) 

sample_tensor=sample_tensor.to(torch.float )

(我不确定是否需要重新分配张量)


1
投票

问题在于您将错误的

preds
(张量)值传递给
loss_fn
函数。仔细观察,您会发现您正在传递
preds = torch.argmax(preds, dim=1)
的输出,而您应该传递
preds = model(sent_id, mask)
的输出。这样做会将两个 dtype int64 张量传递给损失函数。但是,损失函数 (CrossEntropyLoss) 需要一个 dtype float32 的张量作为其第一个参数(即输入参数)——请参阅“示例”,网址为 https://pytorch.org/docs/stable/ generated/torch.nn .CrossEntropyLoss.html。然后你会得到一个错误:“预期的浮点类型...”

要解决这个问题,您可以在更改

preds
的值之前(即在
preds = torch.argmax(preds, dim=1)
之前)计算损失,如下所示。或者,您可以为
model()
的输出指定另一个名称,例如
outp
,并将其传递给损失函数,例如:
loss_fn(outp, labels)

# get model predictions for the current batch

preds = model(sent_id, mask)

# compute the loss between actual and predicted values

loss = loss_fn(preds, labels)
preds =torch.argmax(preds, dim=1)

0
投票

我在下面遇到了同样的错误:

运行时错误:具有类概率的目标的预期浮点类型,得到了 Long

当我将

int
类型的张量设置为
CrossEntropyLoss()
target 参数时,如下所示:

import torch

tensor1 = torch.tensor([0., 1., 2.])
tensor2 = torch.tensor([3, 4, 5]) # Here

cel = nn.CrossEntropyLoss()
cel(input=tensor1, target=tensor2) # Error
                 # ↑↑↑↑ Here ↑↑↑↑

所以,我将

float
类型的张量设置为
target
CrossEntropyLoss()
参数,然后我可以得到如下所示的结果:

*备注:

  • 大小与
    target
    张量相同的
    input
    张量被视为类别概率,其类型必须是
    float
    类型。
  • 大小与
    target
    张量不同的
    input
    张量被视为类型必须为
    int
    类型的类索引。
import torch

tensor1 = torch.tensor([0., 1., 2.])
tensor2 = torch.tensor([3., 4., 5.]) # Here

cel = nn.CrossEntropyLoss()
cel(input=tensor1, target=tensor2) # tensor(14.8913)
                 # ↑↑↑↑ Here ↑↑↑↑
© www.soinside.com 2019 - 2024. All rights reserved.