(使用 cpu)Pytorch:IndexError:索引超出 self 范围。 (使用 cuda)断言 `srcIndex < srcSelectDimSize` failed. How to solve?

问题描述 投票:0回答:2

今天,当我将 BERT 与 Pytorch 和 cuda 一起使用时,出现以下错误: /pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [234,0,0], thread: [ 0,0,0] 断言

srcIndex < srcSelectDimSize
失败。

Epoch [1/100]
Iter:      0,  Train Loss:   1.1,  Train Acc: 39.06%,  Val Loss:   1.0,  Val Acc: 51.90%,  Time: 0:00:04 *
Iter:     10,  Train Loss:  0.99,  Train Acc: 57.81%,  Val Loss:   1.0,  Val Acc: 52.01%,  Time: 0:00:11 *
Iter:     20,  Train Loss:   1.0,  Train Acc: 42.19%,  Val Loss:  0.99,  Val Acc: 52.01%,  Time: 0:00:17 *
Iter:     30,  Train Loss:   1.0,  Train Acc: 40.62%,  Val Loss:  0.99,  Val Acc: 52.12%,  Time: 0:00:23 *
Iter:     40,  Train Loss:   1.0,  Train Acc: 50.00%,  Val Loss:  0.98,  Val Acc: 52.12%,  Time: 0:00:29 *
Iter:     50,  Train Loss:   1.1,  Train Acc: 43.75%,  Val Loss:  0.98,  Val Acc: 52.12%,  Time: 0:00:35 *
Traceback (most recent call last):
  File "/content/drive/MyDrive/Prediction/run.py", line 38, in <module>
    train(config, model, train_iter, dev_iter, test_iter)
  File "/content/drive/MyDrive/Prediction/train_eval.py", line 50, in train
    outputs = model(trains)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/drive/MyDrive/Prediction/models/BERT+Covid.py", line 68, in forward
    output  = self.bert(context, attention_mask=mask)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/transformers/models/bert/modeling_bert.py", line 1005, in forward
    return_dict=return_dict,
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/transformers/models/bert/modeling_bert.py", line 589, in forward
    output_attentions,
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/transformers/models/bert/modeling_bert.py", line 475, in forward
    past_key_value=self_attn_past_key_value,
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/transformers/models/bert/modeling_bert.py", line 408, in forward
    output_attentions,
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/transformers/models/bert/modeling_bert.py", line 323, in forward
    attention_scores = attention_scores / math.sqrt(self.attention_head_size)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [234,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.

 #......I SKIPPED SEVERAL LINES DUE TO THE CHARACTER LIMITATION

/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [235,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed

为了找到到底哪里出了问题,我再次用 CPU 运行我的代码,我得到了这个错误:IndexError: index out of range in self.

traceback (most recent call last):
  File "/content/drive/MyDrive/Prediction/run.py", line 37, in <module>
    train(config, model, train_iter, dev_iter, test_iter)
  File "/content/drive/MyDrive/Prediction/train_eval.py", line 49, in train
    outputs = model(trains)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/drive/MyDrive/Prediction/models/BERT+Covid.py", line 66, in forward
    output  = self.bert(context, attention_mask=mask, )
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/transformers/models/bert/modeling_bert.py", line 993, in forward
    past_key_values_length=past_key_values_length,
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/transformers/models/bert/modeling_bert.py", line 215, in forward
    inputs_embeds = self.word_embeddings(input_ids)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/sparse.py", line 160, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py", line 2043, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

根据我在网上找到的指导,我确定了以下问题:

  1. 输入长度没有超过模型中的最大长度(我设置的pad大小是98,在出错之前我曾尝试打印出输入的形状。确实是(batch_size, pad_size) ).

  2. len(tokenizer)==model.config.vocab_size,所以这不是问题

我现在不知道可能是什么问题,有人可以帮助我吗?

我的模型结构是:

class Model(nn.Module):

    def __init__(self, config):
        super(Model, self).__init__()
        self.modelConfig = BertConfig.from_pretrained('./bert_pretrain/config.json')
        self.bert = BertModel.from_pretrained(config.bert_path,config=self.modelConfig)
        for param in self.bert.parameters():
            param.requires_grad = False
        self.cls_fc_layer = FCLayer(config.hidden_size, config.word_size, config.dropout_rate)
        self.label_classifier = FCLayer(
            config.word_size+config.numerical_size,
            config.num_classes,
            config.dropout_rate,
            use_activation=False,
        )

    def forward(self, x):
        context = x[0]  # input token ids
        mask = x[2]  # mask
        numerical=x[3] #size(batch_size,18)
        
        output  = self.bert(context, attention_mask=mask)
        pooled_output=output[1]
        ##size(batch_size,768)
        pooled_output = self.cls_fc_layer(pooled_output)
        ##size(batch_size,18)
        concat_h = torch.cat([pooled_output, numerical], dim=-1)
        ##size(batch_size,36)
        logits = self.label_classifier(concat_h)
        return logits
python machine-learning nlp pytorch bert-language-model
2个回答
3
投票

有同样的问题 - 问题是我们添加了一个新的特殊令牌 (填充标记 =

[PAD]
)到标记生成器。将填充标记更改为标记生成器已知的标记(在我的例子中是 eos_token)解决了问题:)


2
投票

我已经解决了!!!

通过打印出每批的最大 input_ids

for i, (trains, labels) in enumerate(train_iter):
            print("train max input:", torch.max(trains[0]))
            print("train min input:", torch.min(trains[0]))
            print("train max label:", torch.max(labels))
            print("train min label:", torch.min(labels))

我得到以下输出,最大input_id == 21128,而我的分词器的长度== 21128,这意味着最大input_id应该是21127,这是索引超出范围的地方!

train max input: tensor(21128, device='cuda:0')
train min input: tensor(0, device='cuda:0')
train max label: tensor(2, device='cuda:0')
train min label: tensor(0, device='cuda:0')

出现这个错误的原因可能是因为我手动更改了bert模型的vocab.txt文件(抱歉我是新手......),我通过重新加载原始BERT模型/vocab和config解决了这个问题

© www.soinside.com 2019 - 2024. All rights reserved.