非英语(阿拉伯语)语言的 ROUGE 分数指标不起作用

问题描述 投票:0回答:1

ROUGE

 分数指标不适用于阿拉伯语评估,我该怎么办?

!pip install rouge_score from datasets import load_metric metric= load_metric("rouge") pred_str =['السلام عليكم كيف حالك'] label_str=['السلام عليكم صديقي كيف حالك'] metric.add_batch(predictions=pred_str, references=label_str) metric.compute()

输出

{‘rouge1’: AggregateScore(low=Score(precision=0.0, recall=0.0, fmeasure=0.0), mid=Score(precision=0.0, recall=0.0, fmeasure=0.0), high=Score(precision=0.0, recall=0.0, fmeasure=0.0)), ‘rouge2’: AggregateScore(low=Score(precision=0.0, recall=0.0, fmeasure=0.0), mid=Score(precision=0.0, recall=0.0, fmeasure=0.0), high=Score(precision=0.0, recall=0.0, fmeasure=0.0)), ‘rougeL’: AggregateScore(low=Score(precision=0.0, recall=0.0, fmeasure=0.0), mid=Score(precision=0.0, recall=0.0, fmeasure=0.0), high=Score(precision=0.0, recall=0.0, fmeasure=0.0)), ‘rougeLsum’: AggregateScore(low=Score(precision=0.0, recall=0.0, fmeasure=0.0), mid=Score(precision=0.0, recall=0.0, fmeasure=0.0), high=Score(precision=0.0, recall=0.0, fmeasure=0.0))}
    
nlp metrics huggingface-transformers summarization rouge
1个回答
0
投票
您可以使用

rouge_scorer

 包并添加支持阿拉伯语的分词器。另外,一定不要使用蒸锅。代码如下:

from rouge_score import rouge_scorer r_scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], tokenizer=tokenizer) model_name = 'arabert' tokenizer = AutoTokenizer.from_pretrained(model_name) #huggingface model pred_str ='السلام عليكم كيف حالك' label_str='السلام عليكم صديقي كيف حالك' ROU = r_scorer.score(label_str, pred_str)
    
© www.soinside.com 2019 - 2024. All rights reserved.