使用“sst2”数据集进行“情感分析”(分类)

问题描述 投票:0回答:1

我正在尝试使用 Huggingface 库微调“distilbert-base-uncased”模型,但我遇到了此错误:

IndexError: Target -1 is out of bounds.

使用 Huggingface 数据集加载我的数据集(sst2)并对其进行标记后:

small_train_dataset = encoded_dataset["train"].shuffle(seed=42).select(range(1000))
small_eval_dataset = encoded_dataset["test"].shuffle(seed=42).select(range(1000))
full_train_dataset = encoded_dataset["train"]
full_eval_dataset = encoded_dataset["test"]
# Define the training parameters

metric_name = "accuracy"
model_name = model_checkpoint.split("/")[-1]
weight_decay = 0.01
lr = 2e-5
batch_size = 16
num_train_epochs = 5
import numpy as np

# Define a function to evaluate the model

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)
# Fine-tune the model

args = TrainingArguments(
    f"{model_name}-finetuned-{task}",
    evaluation_strategy = "steps",             
    eval_steps=10,
    save_strategy = "steps",                   
    learning_rate = lr,                   
    per_device_train_batch_size = batch_size,     
    per_device_eval_batch_size = batch_size,      
    num_train_epochs = num_train_epochs,                
    weight_decay = weight_decay,                    
    load_best_model_at_end = True,          
    metric_for_best_model = metric_name,          
    push_to_hub = False,                    
)
trainer = Trainer(
    model=model,
    args=args,
    train_dataset = small_train_dataset,           
    eval_dataset = small_eval_dataset,            
    tokenizer = tokenizer,               
    compute_metrics = compute_metrics         
)
trainer.train()

我尝试更改指标,但没有帮助。

python deep-learning nlp artificial-intelligence huggingface-transformers
1个回答
0
投票

问题出在数据集上,我使用encoded_dataset[“test”]进行验证,与它的名称不同,它由无监督数据组成,所有数据都带有标签-1。将其更改为encoded_dataset[“validation”]后问题得到解决。

© www.soinside.com 2019 - 2024. All rights reserved.