在 Amazon SageMaker 上从 S3 部署 LLM

Question

我训练了 Llama 2 7B 并尝试在 SageMaker 上部署该模型。


from sagemaker.huggingface import HuggingFaceModel

model_s3_path = 's3://bucket/model/model.tar.gz'


# sagemaker config
instance_type = "ml.g4dn.2xlarge"
number_of_gpu = 1
health_check_timeout = 300
image='763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-inference:2.0.0-transformers4.28.1-cpu-py310-ubuntu20.04'

# Define Model and Endpoint configuration parameter
config = {
  'HF_MODEL_ID': "/opt/ml/model", # path to where sagemaker stores the model
  'SM_NUM_GPUS': json.dumps(number_of_gpu), # Number of GPU used per replica
  'MAX_INPUT_LENGTH': json.dumps(1024), # Max length of input text
  'MAX_TOTAL_TOKENS': json.dumps(2048), # Max length of the generation (including input text)
}

# create HuggingFaceModel with the image uri
llm_model = HuggingFaceModel(
  image_uri=image, 
  role=sagemaker.get_execution_role(),  
  model_data=model_s3_path,
  entry_point="deploy.py",
  source_dir="src",
  env=config,
)

要部署我有

llm = llm_model.deploy(
  initial_instance_count=1,
  instance_type=instance_type,
  container_startup_health_check_timeout=health_check_timeout, # 10 minutes to give SageMaker the time to download the model
)

在我的 Sagemaker 工作区中，我有

src

包含加载模型的deploy.py 的目录。

问题是控件直到deploy.py才出现，当

llm_model.deploy

单元格执行时，我收到以下错误

Traceback (most recent call last):
  File "/usr/local/bin/dockerd-entrypoint.py", line 23, in <module>
    serving.main()
  File "/opt/conda/lib/python3.10/site-packages/sagemaker_huggingface_inference_toolkit/serving.py", line 34, in main
    _start_mms()
  File "/opt/conda/lib/python3.10/site-packages/retrying.py", line 56, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)
  File "/opt/conda/lib/python3.10/site-packages/retrying.py", line 257, in call
    return attempt.get(self._wrap_exception)
  File "/opt/conda/lib/python3.10/site-packages/retrying.py", line 301, in get
    six.reraise(self.value[0], self.value[1], self.value[2])
  File "/opt/conda/lib/python3.10/site-packages/six.py", line 719, in reraise
    raise value
  File "/opt/conda/lib/python3.10/site-packages/retrying.py", line 251, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "/opt/conda/lib/python3.10/site-packages/sagemaker_huggingface_inference_toolkit/serving.py", line 30, in _start_mms
    mms_model_server.start_model_server(handler_service=HANDLER_SERVICE)
  File "/opt/conda/lib/python3.10/site-packages/sagemaker_huggingface_inference_toolkit/mms_model_server.py", line 81, in start_model_server
    storage_dir = _load_model_from_hub(
  File "/opt/conda/lib/python3.10/site-packages/sagemaker_huggingface_inference_toolkit/transformers_utils.py", line 204, in _load_model_from_hub
    files = HfApi().model_info(model_id).siblings
  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn
    validate_repo_id(arg_value)
  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 158, in validate_repo_id
    raise HFValidationError(huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/opt/ml/model'. Use `repo_type` argument if needed.

容器正在尝试连接到 Huggingface hub，而不是从 S3 加载模型。我该如何解决这个问题？

Answer 1

sagemaker.huggingface.HuggingFaceModel

可以处理

model_data

参数的 S3 路径，如本示例中所述。

https://github.com/aws/amazon-sagemaker-examples/blob/main/inference/generativeai/huggingface-multimodal/stability-cascade/DeployStableCascade.ipynb

由于您将自定义映像与

image_uri

结合使用，该映像可能与 SageMaker 不兼容，并且它不会尝试处理您指定的入口点脚本。

要隔离问题，请尝试更改代码以使用 SageMaker 的官方映像。然后调查为什么您的自定义图像没有加载入口点脚本。

另请参阅：

在 Amazon SageMaker 上从 S3 部署 LLM

问题描述投票：0回答：1

1个回答

最新问题

在 Amazon SageMaker 上从 S3 部署 LLM

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1