我从https://github.com/tencent-ailab/IP-Adapter
下载了软件包运行命令来训练 IP-Adapter plus 模型(输入:文本 + 图像,输出:图像):
accelerate launch --num_processes 2 --multi_gpu --mixed_precision "fp16" \
tutorial_train_plus.py \
--pretrained_model_name_or_path="stable-diffusion-v1-5/" \
--image_encoder_path="models/image_encoder/" \
--data_json_file="assets/prompt_image.json" \
--data_root_path="assets/train/" \
--mixed_precision="fp16" \
--resolution=512 \
--train_batch_size=2 \
--dataloader_num_workers=4 \
--learning_rate=1e-04 \
--weight_decay=0.01 \
--output_dir="out_model/" \
--save_steps=3
训练过程中,有消息但可以继续训练:
Removed shared tensor {'adapter_modules.27.to_k_ip.weight', 'adapter_modules.1.to_v_ip.weight', 'adapter_modules.31.to_k_ip.weight', 'adapter_modules.15.to_k_ip.weight', 'adapter_modules.31.to_v_ip.weight', 'adapter_modules.11.to_k_ip.weight', 'adapter_modules.23.to_k_ip.weight', 'adapter_modules.3.to_k_ip.weight', 'adapter_modules.25.to_v_ip.weight', 'adapter_modules.21.to_k_ip.weight', 'adapter_modules.17.to_v_ip.weight', 'adapter_modules.13.to_k_ip.weight', 'adapter_modules.17.to_k_ip.weight', 'adapter_modules.19.to_v_ip.weight', 'adapter_modules.13.to_v_ip.weight', 'adapter_modules.7.to_v_ip.weight', 'adapter_modules.7.to_k_ip.weight', 'adapter_modules.29.to_k_ip.weight', 'adapter_modules.3.to_v_ip.weight', 'adapter_modules.5.to_v_ip.weight', 'adapter_modules.21.to_v_ip.weight', 'adapter_modules.5.to_k_ip.weight', 'adapter_modules.23.to_v_ip.weight', 'adapter_modules.25.to_k_ip.weight', 'adapter_modules.1.to_k_ip.weight', 'adapter_modules.9.to_v_ip.weight', 'adapter_modules.9.to_k_ip.weight', 'adapter_modules.15.to_v_ip.weight', 'adapter_modules.27.to_v_ip.weight', 'adapter_modules.29.to_v_ip.weight', 'adapter_modules.19.to_k_ip.weight', 'adapter_modules.11.to_v_ip.weight'} while saving. This should be OK, but check by verifying that you don't receive anywarning while reloading
训练完成后转换权重生成ip_adapter.bin,然后运行推理代码ip_adapter-plus_demo.py,该文件中的模型路径如下:
base_model_path = "SG161222/Realistic_Vision_V4.0_noVAE"
vae_model_path = "stabilityai/sd-vae-ft-mse"
image_encoder_path = "models/image_encoder"
ip_ckpt = "out_model/demo_plus_checkpoint/ip_adapter.bin"
它显示错误:
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for ModuleList:
Missing key(s) in state_dict: "1.to_k_ip.weight", "1.to_v_ip.weight", "3.to_k_ip.weight", "3.to_v_ip.weight", "5.to_k_ip.weight", "5.to_v_ip.weight", "7.to_k_ip.weight", "7.to_v_ip.weight", "9.to_k_ip.weight", "9.to_v_ip.weight", "11.to_k_ip.weight", "11.to_v_ip.weight", "13.to_k_ip.weight", "13.to_v_ip.weight", "15.to_k_ip.weight", "15.to_v_ip.weight", "17.to_k_ip.weight", "17.to_v_ip.weight", "19.to_k_ip.weight", "19.to_v_ip.weight", "21.to_k_ip.weight", "21.to_v_ip.weight", "23.to_k_ip.weight", "23.to_v_ip.weight", "25.to_k_ip.weight", "25.to_v_ip.weight", "27.to_k_ip.weight", "27.to_v_ip.weight", "29.to_k_ip.weight", "29.to_v_ip.weight", "31.to_k_ip.weight", "31.to_v_ip.weight".
任何步骤错误导致此错误?
模型现在可以成功训练和推理了: 在模型训练文件tutorial_train_plus.py中将safe_serialization设置为False:
accelerator.save_state(save_path, safe_serialization=False)
训练期间它将生成 pytorch_model.bin 而不是 model.safetensors。
训练完成后,根据readme中的原始说明修改模型转换代码如下:
ckpt = "pytorch_model.bin" # set correct path
sd = torch.load(ckpt)
将生成模型文件 ip_adapter.bin 用于推理。