保存微调Falcon HuggingFace LLM模型

问题描述 投票:0回答:1

我正在尝试保存我的模型,这样每次我想使用它时就不需要重新下载基本模型,但似乎没有什么对我有用,我希望得到你的帮助。

以下参数用于训练:

hf_model_name = "tiiuae/falcon-7b-instruct"
dir_path = 'Tiiuae-falcon-7b-instruct'
model_name_is = f"peft-training"
output_dir = f'{dir_path}/{model_name_is}'
logs_dir = f'{dir_path}/logs'
model_final_path = f"{output_dir}/final_model/"
EPOCHS = 3500
LOGS = 1
SAVES = 700
EVALS = EPOCHS / 100
compute_dtype = getattr(torch, "float16")
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=False,
)
model = AutoModelForCausalLM.from_pretrained(
        "tiiuae/falcon-7b-instruct",
        quantization_config=bnb_config,
        device_map={"": 0},
        trust_remote_code=False
)
peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.05, # 0.1
    r=64,
    bias="lora_only", # none
    task_type="CAUSAL_LM",
    target_modules=[
        "query_key_value"
    ],
)
model.config.use_cache = False
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-7b-instruct", trust_remote_code=False)
tokenizer.pad_token = tokenizer.eos_token
training_arguments = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    optim='paged_adamw_32bit',
    max_steps=EPOCHS,
    save_steps=SAVES,
    logging_steps=LOGS,
    logging_dir=logs_dir,
    eval_steps=EVALS,
    evaluation_strategy="steps",
    fp16=True,
    learning_rate=0.001,
    max_grad_norm=0.3,
    warmup_ratio=0.15, # 0.03
    lr_scheduler_type="constant",
    disable_tqdm=True,
)
model.config.use_cache = False
trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=448,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=True,
)
for name, module in trainer.model.named_modules():
    if "norm" in name:
        module = module.to(torch.float32)
train_result = trainer.train()

我确实喜欢这样保存它:

metrics = train_result.metrics
max_train_samples = len(train_dataset)
metrics["train_samples"] = min(max_train_samples, len(train_dataset))
# save train results
trainer.log_metrics("train", metrics)
trainer.save_metrics("train", metrics)
# compute evaluation results
metrics = trainer.evaluate()
max_val_samples = len(eval_dataset)
metrics["eval_samples"] = min(max_val_samples, len(eval_dataset))
# save evaluation results
trainer.log_metrics("eval", metrics)
trainer.save_metrics("eval", metrics)

model.save_pretrained(model_final_path)

现在我已经尝试了很多不同的方法来一次又一次地加载它或以各种方式加载和保存它(例如添加

lora_model.merge_and_unload()
,简单地使用
local_model = AutoModelForCausalLM.from_pretrained(after_merge_model_path)
等等),但似乎没有什么对我有用的一切结果出现错误(有时是相同的错误,有时是不同的错误),我需要您的帮助。

如果你认为它更适合,我也在这里提出了一个问题HuggingFace论坛

python huggingface-transformers large-language-model inference pre-trained-model
1个回答
0
投票

微调是通过在基本模型之上训练适配器来完成的。训练结束后,您只保存适配器,而不保存基本模型。所以工作流程如下:

训练期间:

  • 您从 HF 下载基础模型并将其保存在缓存目录中
  • 训练 PEFT 适配器并保存它

推理过程中

  • 加载缓存的 HF 基础模型
  • 加载保存的peft适配器并将其应用到基础模型

步骤 1. 在预定义的缓存目录中下载 HF 模型:

import os
from pathlib import Path

# set cache for pretrained model
os.environ['HF_HOME'] = '/content/assets/hf_cache/'
os.environ['HF_DATASETS_CACHE'] = '/content/assets/hf_datasets/'

dir_path = Path('/content')
adapter_final_path = dir_path / f"output" / "final_adapter"
base_quantized_path = dir_path / f"output" / "base_model_q"

hf_model_name = "tiiuae/falcon-7b-instruct"

# load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(hf_model_name, 
                              trust_remote_code=False)
        
tokenizer.pad_token = tokenizer.eos_token


# load the model
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=False,
)


model = AutoModelForCausalLM.from_pretrained(
        hf_model_name,
        quantization_config=bnb_config,
        device_map={"": 0},
        trust_remote_code=False
)

model.save_pretrained(base_quantized_path)
tokenizer.save_pretrained(base_quantized_path)
...

培训后保存peft适配器:

... train the model...

train_result = trainer.train()


model.save_pretrained(adapter_final_path)

在推理期间重新加载基础模型和peft适配器:


# load base model
model = AutoModelForCausalLM.from_pretrained(base_quantized_path)
tokenizer = AutoTokenizer.from_pretrained(base_quantized_path)

# apply saved adapter to the model
model.load_adapter(adapter_final_path)
© www.soinside.com 2019 - 2024. All rights reserved.