我正在使用 Huggingface Trainer 来训练一个子类为 Llama llm 的 cumstom 模型。经过标记器标记后,我的数据集具有这些字段“
input_ids
”、“labels
”等,并且我还添加了 2 个自定义列“interact_ids
”和“candidate_ids
”。但我无法在我的模型'class LLMWithCustomLayer(LlamaForCausalLM)
'的forward()函数中获取这些自定义字段。
def forward(
self,
input_ids: torch.LongTensor = None,
attention_mask: Optional[torch.Tensor] = None,
position_ids: Optional[torch.LongTensor] = None,
past_key_values: Optional[List[torch.FloatTensor]] = None,
inputs_embeds: Optional[torch.FloatTensor] = None,
labels: Optional[torch.LongTensor] = None,
use_cache: Optional[bool] = None,
output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
return_dict: Optional[bool] = None,
interact_ids = None,
candidate_ids = None,
):
print('interact_ids, candidate_ids', interact_ids, candidate_ids) # they are none
interact_embs = []
candidate_embs = []
for i in range(interact_ids.shape(0)):
# O_i = F_i (e_i)
interact_embs.append(self.item_emb_proj(self.get_item_emb(interact_ids)))
# O_i = F_i (e_i)
candidate_embs.append(self.item_emb_proj(self.get_item_emb(candidate_ids)))
# replace [CandidateEmb] and [HistoryEmb]
inputs_embeds = self.replace_hist_candi_token(input_ids, inputs_embeds ,interact_embs, candidate_embs)
return super().forward(
input_ids=input_ids,
attention_mask=attention_mask,
position_ids=position_ids,
past_key_values=past_key_values,
inputs_embeds=inputs_embeds,
use_cache=use_cache,
output_attentions=output_attentions,
output_hidden_states=output_hidden_states,
return_dict=return_dict,
labels = labels
)
我是法学硕士微调的新人。谁能帮助我吗?我将非常感激。
您需要修改数据整理器以将
interact_ids
和 candidate_ids
传递给您的模型,因为 Trainer 默认情况下会忽略额外的列。
修改数据整理器
class CustomDataCollator(DataCollatorWithPadding):
def __call__(self, features):
batch = super().__call__(features)
batch["interact_ids"] = torch.tensor([f["interact_ids"] for f in features])
batch["candidate_ids"] = torch.tensor([f["candidate_ids"] for f in features])
return batch
然后将其传递给
Trainer
trainer = Trainer(
model=LLMWithCustomLayer.from_pretrained("your-llama-model"),
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
tokenizer=tokenizer,
data_collator=CustomDataCollator(tokenizer)
)
现在,您的
forward()
方法将收到 interact_ids
和 candidate_ids
。
希望,它会起作用!