变压器输入序列的长度如何确定？

Question

给BERT的文本长度很短。我选择了最大长度31并得到以下错误： “

ValueError：input_ids（形状 torch.Size（[31]））或 attention_mask（形状 torch.Size（[31]））错误的形状”

如何设置BERT输入长度？

我使用 Transformer 2.9.0 版本

模特来电：

与标记文本和创建转换器输入相关的代码： def _get_transformer_input2(tokens_a, tokens_b, max_seq_length, tokenizer, model_specs):

    tokens = []
    segment_ids = []
    tokens.append(model_specs['CLS_TOKEN'])
    segment_ids.append(0)
    for token in tokens_a:
        tokens.append(token)
        segment_ids.append(0)
    tokens.append(model_specs['SEP_TOKEN'])
    segment_ids.append(0)
    if model_specs['MODEL_TYPE'] == 'roberta':
        tokens.append(model_specs['SEP_TOKEN'])
        segment_ids.append(0)
    if model_specs['MODEL_TYPE'] != 'roberta':
        for token in tokens_b:
            tokens.append(token)
            segment_ids.append(1)
        tokens.append(model_specs['SEP_TOKEN'])
        segment_ids.append(1)
    else:
        for token in tokens_b:
            tokens.append(token)
            segment_ids.append(0)
        tokens.append(model_specs['SEP_TOKEN'])
        segment_ids.append(0)
        if model_specs['MODEL_TYPE'] == 'roberta':
            tokens.append(model_specs['SEP_TOKEN'])
            segment_ids.append(0)
    input_ids = tokenizer.convert_tokens_to_ids(tokens)
    # The mask has 1 for real tokens and 0 for padding tokens. Only real
    # tokens are attended to.
    input_mask = [1] * len(input_ids)
    # Zero-pad up to the sequence length.
    while len(input_ids) < 31:
        if model_specs['MODEL_TYPE'] == 'roberta':
            input_ids.append(1)
        else:
            input_ids.append(0)
        input_mask.append(0)
        segment_ids.append(0)
    assert len(input_ids) ==31
    assert len(input_mask) == 31
    assert len(segment_ids) == 31
    return tokens, input_ids, input_mask, segment_ids

变压器输入序列的长度如何确定？

问题描述投票：0回答：0

最新问题

变压器输入序列的长度如何确定？

问题描述 投票：0回答：0

最新问题

问题描述投票：0回答：0