使用nn.输出和目标标签之间的交叉熵

Question

我用这个代码

训练模型的功能

def train():
  
  model.train()

  total_loss, total_accuracy = 0, 0
  
  # empty list to save model predictions
  total_preds=[]
  
  # iterate over batches
  for step,batch in enumerate(train_dataloader):
    
    # progress update after every 50 batches.
    if step % 50 == 0 and not step == 0:
      print('  Batch {:>5,}  of  {:>5,}.'.format(step, len(train_dataloader)))

    # push the batch to gpu
    #batch = [r for r in batch]
 
    sent_id, mask, labels = batch['input_ids'],batch['attention_mask'],batch['labels']
    print(6)
    print(sent_id)
    print(mask)
    print(labels)
    print(batch['input_ids'].shape)
    print(batch['attention_mask'].shape)
    print(batch['labels'].shape)

    # clear previously calculated gradients 
    model.zero_grad() 
    print(7)       

    # get model predictions for the current batch
    preds = model(sent_id, mask)
    print(8)
    print(len(preds))
    print(len(labels))
    print(preds.size())
    
   
    preds =torch.argmax(preds, dim=1)
    preds =torch.argmax(preds, dim=1)
    print(preds)
    print(labels)

    # compute the loss between actual and predicted values
    loss = loss_fn(preds, labels)
    print(9)

    # add on to the total loss
    total_loss = total_loss + loss.item()
    print(10)

    # backward pass to calculate the gradients
    loss.backward()

    # clip the the gradients to 1.0. It helps in preventing the exploding gradient problem
  # clip the the gradients to 1.0. It helps in preventing the exploding gradient problem
    torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)

    # update parameters
    optimizer.step()

    # model predictions are stored on GPU. So, push it to CPU
    preds=preds.numpy()

    # append the model predictions
    total_preds.append(preds)

  # compute the training loss of the epoch
  avg_loss = total_loss / len(train_dataloader)
  
  # predictions are in the form of (no. of batches, size of batch, no. of classes).
  # reshape the predictions in form of (number of samples, no. of classes)
  total_preds  = np.concatenate(total_preds, axis=0)

  #returns the loss and predictions
  return avg_loss, total_preds

import torch.nn as nn

loss_fn=nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.00001)

我在计算损失时出错了普雷兹张量([5, 1, 1, 1, 0, 2, 1, 4, 2, 3, 0, 2, 0, 1, 0, 3, 5, 3, 1, 2, 0, 2, 2, 1 , 0, 1, 4, 0, 5, 5, 4, 5, 0, 2, 0, 1, 4, 0, 0, 3, 5, 1, 1, 1, 4, 4, 4, 1, 2, 1, 3, 3, 2, 1, 0, 2, 0, 4, 4, 4, 3, 2, 0, 5])

标签张量([0, 0, 1, 2, 3, 0, 0, 0, 0, 1, 1, 0, 0, 0, 4, 0, 0, 2, 1, 0, 0, 0, 0, 0 , 1, 0, 0, 1, 1, 2, 1, 3, 2, 0, 3, 4, 0, 1, 0, 0, 0, 0, 0, 0, 5, 0, 0, 3, 0, 0, 1, 0, 0, 0, 2, 0, 0, 2, 0, 0, 2, 0, 0, 0]) 我用它们损失=loss_fn(preds,标签) 错误：

in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
   2844     if size_average is not None or reduce is not None:
   2845         reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2846     return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
   2847 
   2848 

RuntimeError: Expected floating point type for target with class probabilities, got Long

Answer 1

运行时错误：具有类的目标的预期浮点类型概率，长了

错误非常明显。您需要将目标张量的 dtype 转换为 float。这与您使用的损失函数有关。既然你选择了 CE 损失，你最终会得到概率。而这些概率自然都是浮点数。这意味着您的目标也应该是浮动的。例如，您可能有一个目标张量 a= [1, 0, 0, 1] 您需要将其转换为 [1.0 , 0.0 , 0.0 , 1.0]

您可以使用下表检查所有类型。

╔══════════════════════════╦═══════════════════════════════╦════════════════════╦═════════════════════════╗
║        Data type         ║             dtype             ║     CPU tensor     ║       GPU tensor        ║
╠══════════════════════════╬═══════════════════════════════╬════════════════════╬═════════════════════════╣
║ 32-bit floating point    ║ torch.float32 or torch.float  ║ torch.FloatTensor  ║ torch.cuda.FloatTensor  ║
║ 64-bit floating point    ║ torch.float64 or torch.double ║ torch.DoubleTensor ║ torch.cuda.DoubleTensor ║
║ 16-bit floating point    ║ torch.float16 or torch.half   ║ torch.HalfTensor   ║ torch.cuda.HalfTensor   ║
║ 8-bit integer (unsigned) ║ torch.uint8                   ║ torch.ByteTensor   ║ torch.cuda.ByteTensor   ║
║ 8-bit integer (signed)   ║ torch.int8                    ║ torch.CharTensor   ║ torch.cuda.CharTensor   ║
║ 16-bit integer (signed)  ║ torch.int16 or torch.short    ║ torch.ShortTensor  ║ torch.cuda.ShortTensor  ║
║ 32-bit integer (signed)  ║ torch.int32 or torch.int      ║ torch.IntTensor    ║ torch.cuda.IntTensor    ║
║ 64-bit integer (signed)  ║ torch.int64 or torch.long     ║ torch.LongTensor   ║ torch.cuda.LongTensor   ║
║ Boolean                  ║ torch.bool                    ║ torch.BoolTensor   ║ torch.cuda.BoolTensor   ║
╚══════════════════════════╩═══════════════════════════════╩════════════════════╩═════════════════════════╝

为了将张量转换为另一种数据类型，您可以使用类似的东西

sample_tensor=sample_tensor.type(torch.FloatTensor)

或

sample_tensor=sample_tensor.to(torch.float )

（我不确定是否需要重新分配张量）

Answer 2

问题在于您将错误的

preds

（张量）值传递给

loss_fn

函数。仔细观察，您会发现您正在传递

preds = torch.argmax(preds, dim=1)

的输出，而您应该传递

preds = model(sent_id, mask)

的输出。这样做会将两个 dtype int64 张量传递给损失函数。但是，损失函数 (CrossEntropyLoss) 需要一个 dtype float32 的张量作为其第一个参数（即输入参数）——请参阅“示例”，网址为 https://pytorch.org/docs/stable/ generated/torch.nn .CrossEntropyLoss.html。然后你会得到一个错误：“预期的浮点类型...”

要解决这个问题，您可以在更改

preds

的值之前（即在

preds = torch.argmax(preds, dim=1)

之前）计算损失，如下所示。或者，您可以为

model()

的输出指定另一个名称，例如

outp

，并将其传递给损失函数，例如：

loss_fn(outp, labels)

。

# get model predictions for the current batch

preds = model(sent_id, mask)

# compute the loss between actual and predicted values

loss = loss_fn(preds, labels)
preds =torch.argmax(preds, dim=1)

Answer 3

我在下面遇到了同样的错误：

运行时错误：具有类概率的目标的预期浮点类型，得到了 Long

当我将

int

类型的张量设置为

CrossEntropyLoss()

的 target 参数时，如下所示：

import torch

tensor1 = torch.tensor([0., 1., 2.])
tensor2 = torch.tensor([3, 4, 5]) # Here

cel = nn.CrossEntropyLoss()
cel(input=tensor1, target=tensor2) # Error
                 # ↑↑↑↑ Here ↑↑↑↑

所以，我将

float

类型的张量设置为

target

的

CrossEntropyLoss()

参数，然后我可以得到如下所示的结果：

*备注：

大小与
```
target
```
张量相同的
```
input
```
张量被视为类别概率，其类型必须是
```
float
```
类型。
大小与
```
target
```
张量不同的
```
input
```
张量被视为类型必须为
```
int
```
类型的类索引。

import torch

tensor1 = torch.tensor([0., 1., 2.])
tensor2 = torch.tensor([3., 4., 5.]) # Here

cel = nn.CrossEntropyLoss()
cel(input=tensor1, target=tensor2) # tensor(14.8913)
                 # ↑↑↑↑ Here ↑↑↑↑

使用nn.输出和目标标签之间的交叉熵

问题描述投票：0回答：3

训练模型的功能

3个回答

最新问题

使用nn.输出和目标标签之间的交叉熵

问题描述 投票：0回答：3

训练模型的功能

3个回答

最新问题

问题描述投票：0回答：3