我目前正在开发一个涉及 MobileVLM 模型的项目,使用 Hugging Face Transformers 库加载预训练模型。我在 SLURM 集群上运行脚本时遇到问题。
Exception ignored in atexit callback: <function matmul_ext_update_autotune_table at 0x7fbb99fa4ca0>
Traceback (most recent call last):
File "/public/home/swun-caiy2/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 444, in matmul_ext_update_autotune_table
fp16_matmul._update_autotune_table()
File "/public/home/swun-caiy2/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 421, in _update_autotune_table
TritonMatmul._update_autotune_table(__class__.__name__ + "_2d_kernel", __class__._2d_kernel)
File "/public/home/swun-caiy2/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 150, in _update_autotune_table
cache_manager.put(autotune_table)
File "/public/home/swun-caiy2/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 66, in put
with FileLock(self.lock_path):
File "/public/home/swun-caiy2/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/filelock/_api.py", line 297, in __enter__
self.acquire()
File "/public/home/swun-caiy2/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/filelock/_api.py", line 255, in acquire
self._acquire()
File "/public/home/swun-caiy2/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/filelock/_unix.py", line 48, in _acquire
raise NotImplementedError(msg) from exception
NotImplementedError: FileSystem does not appear to support flock; user SoftFileLock instead
这是我的 SLURM 脚本的相关部分:
from scripts.inference import inference_once
import torch
# model_path = "mtgv/MobileVLM-1.7B" # finetune
model_path = "/public/home/swun-caiy2/wensm/mobilevlm-v2/MobileVLM-main/mtgv"
image_file = "assets/samples/my_book.jpg"
# prompt_str = "who are you?\nIgnore the content of uploading pictures when answering questions."
prompt_str = "What is the title of this book?"
# (or) What is the title of this book?
# (or) Is this book related to Education & Teaching?
torch.cuda.set_device(0)
args = type('Args', (), {
"model_path": model_path,
"image_file": image_file,
"prompt": prompt_str,
"conv_mode": "v1",
"temperature": 0,
"top_p": None,
"num_beams": 1,
"max_new_tokens": 512,
"load_8bit": False,
"load_4bit": False,
})()
inference_once(args)
如何解决不支持flock的文件系统上的文件锁定问题?
如果某个目录支持文件锁定,则可以将缓存目录的环境变量更改为该目录,例如:
$ export HF_HOME="/path/to/directory/with/file/locking"