我正在尝试使用 vLLM 运行 gemma-2b 模型(就像此链接中的那样https://docs.vllm.ai/en/latest/models/supported_models.html)。起初,我尝试使用 bfloat16 以 gemma 的默认设置运行,但它不起作用。这是错误消息:
ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your NVIDIA GeForce RTX 2080 Ti GPU has compute capability 7.5. You can use float16 instead by explicitly setting thedtype flag in CLI, for example: --dtype=half.
因此,我将数据类型更改为 float16。它使代码可执行。然而,输出没有任何意义。
这是我的代码:
from vllm import LLM
llm = LLM(model="google/gemma-2b") # Name or path of your model
output = llm.generate("Hello, my name is")
print(output)
这是对我的提示生成的响应:
outputs=[CompletionOutput(index=0, text=' GHFW问我 ThinkmariKeywordsნ calendrier譁 интел个人 Kabupatenargu antem викона MILE'
大家有什么想法吗???请帮忙...
我也尝试过 dtype='float32',但 GPU 无法处理。这是错误消息:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB. GPU 0 has a total capacty of 10.75 GiB of which 81.19 MiB is free. Including non-PyTorch memory, this process has 9.74 GiB memory in use. Of the allocated memory 9.41 GiB is allocated by PyTorch, and 13.51 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
问题解决了吗?我也遇到了同样的问题,gemma2b一直在胡言乱语..