我正在尝试使用有限的 VRAM 来微调预训练模型。为了实现这一目标,我使用量化和自动混合精度 (AMP)。但是,我遇到了一个我似乎无法解决的问题。您能帮我找出问题所在吗?
这是一个最小的例子:
import os
from transformers import BitsAndBytesConfig, OPTForCausalLM, GPT2TokenizerFast
import torch
from torch.cuda.amp import GradScaler, autocast
model_name = "facebook/opt-1.3b"
cache_dir = './models'
os.environ["CUDA_VISIBLE_DEVICES"] = "7"
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16
)
pretrained_model:OPTForCausalLM = OPTForCausalLM.from_pretrained(model_name,
cache_dir=cache_dir,
quantization_config=quantization_config)
tokenizer:GPT2TokenizerFast = GPT2TokenizerFast.from_pretrained(model_name,
cache_dir=cache_dir)
optimizer = torch.optim.AdamW(pretrained_model.parameters(), lr=1e-4)
scaler = GradScaler()
input_ids = torch.LongTensor([[0, 1, 2, 3]]).to(0)
labels = torch.LongTensor([[1, 2, 3, 4]]).to(0)
with torch.autocast(device_type='cuda'):
out = pretrained_model(input_ids=input_ids, labels=labels)
loss = out.loss
scaler.scale(out.loss).backward()
scaler.step(optimizer)
scaler.update()
optimizer.zero_grad()
print(f'End')
在
scaler.step(optimizer)
行,出现错误:
Exception has occurred: ValueError: Attempting to unscale FP16 gradients.
提前感谢您的帮助!