训练 IP-Adapter plus 模型后出现推理错误

问题描述 投票:0回答:1

我从https://github.com/tencent-ailab/IP-Adapter

下载了软件包

运行命令来训练 IP-Adapter plus 模型(输入:文本 + 图像,输出:图像):

accelerate launch --num_processes 2 --multi_gpu --mixed_precision "fp16" \
  tutorial_train_plus.py \
  --pretrained_model_name_or_path="stable-diffusion-v1-5/" \
  --image_encoder_path="models/image_encoder/" \
  --data_json_file="assets/prompt_image.json" \
  --data_root_path="assets/train/" \
  --mixed_precision="fp16" \
  --resolution=512 \
  --train_batch_size=2 \
  --dataloader_num_workers=4 \
  --learning_rate=1e-04 \
  --weight_decay=0.01 \
  --output_dir="out_model/" \
  --save_steps=3

训练过程中,有消息但可以继续训练:

Removed shared tensor {'adapter_modules.27.to_k_ip.weight', 'adapter_modules.1.to_v_ip.weight', 'adapter_modules.31.to_k_ip.weight', 'adapter_modules.15.to_k_ip.weight', 'adapter_modules.31.to_v_ip.weight', 'adapter_modules.11.to_k_ip.weight', 'adapter_modules.23.to_k_ip.weight', 'adapter_modules.3.to_k_ip.weight', 'adapter_modules.25.to_v_ip.weight', 'adapter_modules.21.to_k_ip.weight', 'adapter_modules.17.to_v_ip.weight', 'adapter_modules.13.to_k_ip.weight', 'adapter_modules.17.to_k_ip.weight', 'adapter_modules.19.to_v_ip.weight', 'adapter_modules.13.to_v_ip.weight', 'adapter_modules.7.to_v_ip.weight', 'adapter_modules.7.to_k_ip.weight', 'adapter_modules.29.to_k_ip.weight', 'adapter_modules.3.to_v_ip.weight', 'adapter_modules.5.to_v_ip.weight', 'adapter_modules.21.to_v_ip.weight', 'adapter_modules.5.to_k_ip.weight', 'adapter_modules.23.to_v_ip.weight', 'adapter_modules.25.to_k_ip.weight', 'adapter_modules.1.to_k_ip.weight', 'adapter_modules.9.to_v_ip.weight', 'adapter_modules.9.to_k_ip.weight', 'adapter_modules.15.to_v_ip.weight', 'adapter_modules.27.to_v_ip.weight', 'adapter_modules.29.to_v_ip.weight', 'adapter_modules.19.to_k_ip.weight', 'adapter_modules.11.to_v_ip.weight'} while saving. This should be OK, but check by verifying that you don't receive anywarning while reloading

训练完成后转换权重生成ip_adapter.bin,然后运行推理代码ip_adapter-plus_demo.py,该文件中的模型路径如下:

base_model_path = "SG161222/Realistic_Vision_V4.0_noVAE"
vae_model_path = "stabilityai/sd-vae-ft-mse"
image_encoder_path = "models/image_encoder"
ip_ckpt = "out_model/demo_plus_checkpoint/ip_adapter.bin"

它显示错误:

raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for ModuleList:
        Missing key(s) in state_dict: "1.to_k_ip.weight", "1.to_v_ip.weight", "3.to_k_ip.weight", "3.to_v_ip.weight", "5.to_k_ip.weight", "5.to_v_ip.weight", "7.to_k_ip.weight", "7.to_v_ip.weight", "9.to_k_ip.weight", "9.to_v_ip.weight", "11.to_k_ip.weight", "11.to_v_ip.weight", "13.to_k_ip.weight", "13.to_v_ip.weight", "15.to_k_ip.weight", "15.to_v_ip.weight", "17.to_k_ip.weight", "17.to_v_ip.weight", "19.to_k_ip.weight", "19.to_v_ip.weight", "21.to_k_ip.weight", "21.to_v_ip.weight", "23.to_k_ip.weight", "23.to_v_ip.weight", "25.to_k_ip.weight", "25.to_v_ip.weight", "27.to_k_ip.weight", "27.to_v_ip.weight", "29.to_k_ip.weight", "29.to_v_ip.weight", "31.to_k_ip.weight", "31.to_v_ip.weight".

任何步骤错误导致此错误?

machine-learning deep-learning pytorch transformer-model stable-diffusion
1个回答
0
投票

模型现在可以成功训练和推理了: 在模型训练文件tutorial_train_plus.py中将safe_serialization设置为False:

accelerator.save_state(save_path, safe_serialization=False)

训练期间它将生成 pytorch_model.bin 而不是 model.safetensors。

训练完成后,根据readme中的原始说明修改模型转换代码如下:

ckpt = "pytorch_model.bin" # set correct path
sd = torch.load(ckpt)

将生成模型文件 ip_adapter.bin 用于推理。

© www.soinside.com 2019 - 2024. All rights reserved.