加载
BertEmbedding
时出现以下错误:
代码:
name = "microsoft/codebert-base"
from transformers import BertModel
from transformers import BertTokenizer
print("[ Using pretrained BERT embeddings ]")
self.bert_tokenizer = BertTokenizer.from_pretrained(name, do_lower_case=lower_case)
self.bert_model = BertModel.from_pretrained(name)
if fix_emb:
print("[ Fix BERT layers ]")
self.bert_model.eval()
for param in self.bert_model.parameters():
param.requires_grad = False
else:
print("[ Finetune BERT layers ]")
self.bert_model.train()
[ Using pretrained BERT embeddings ]
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'RobertaTokenizer'.
The class this function is called from is 'BertTokenizer'.
这个名字
codebert-base
有点误导,因为模特实际上是罗伯塔(Roberta)。 Bert 和 Roberta 的架构相似,仅显示出微小的差异,但分词器完全不同(以及相关方法,但这与这里无关)。
您应该像这样加载
microsoft/codebert-base
:
from transformers import RobertaModel
from transformers import RobertaTokenizer
name = "microsoft/codebert-base"
tokenizer = RobertaTokenizer.from_pretrained(name)
model = RobertaModel.from_pretrained(name)
或者您可以使用自动类别,它将为您选择合适的类别:
from transformers import AutoTokenizer, AutoModel
name = "microsoft/codebert-base"
tokenizer = AutoTokenizer.from_pretrained(name)
model = AutoModel.from_pretrained(name)
感谢您的 Roberta 回答,我正在寻找 RAG 实现的代码示例,它可以获取一些上下文文档/字符串 - 至于 Roberta,我没有找到任何检索器。如果您能提供一个有效的示例,我将不胜感激。