当我尝试在已部署的 SageMaker 端点(对于 TensorFlow 模型)上调用 .predict() 时,遇到最大递归深度超出错误。我已将日志记录语句放入推理脚本中,特别是在我的 input_handler 函数和处理程序调用的数据读取/清理函数中。
日志表明一切正常,直到我在
read_latest_file()
上调用我的第一个 user_heart_rate_uri
函数。因此,日志 "Called all user_data_uri functions"
会在 CloudWatch 中打印,但 "read_latest_file on heart rate called"
不会打印。我已将日志语句进一步放入 read_latest_file() 中,我想我知道问题发生在哪里。
**这是我的 input_handler 函数的一部分: **
def input_handler(data, context):
logger.info("input_handler entered")
if context.request_content_type == 'application/json': # should be context for testing
logger.info(f"Raw data type: {type(data)}")
d = data.read().decode('utf-8')
logger.info(f"Decoded data type: {type(d)}")
logger.info(f"d value: {d}")
data_object = json.loads(d)
data_object_dict_check = isinstance(data_object, dict)
data_object_string_check = isinstance(data_object, str)
if (not data_object_dict_check and data_object_string_check):
logger.info("String and not dictionary")
data_object = ast.literal_eval(data_object)
data_object_type_check = type(data_object)
# logger.info(f"data_object value : {data_object}")
logger.info(f"Deserialized data type check: {data_object_type_check}")
# logger.info(f"data_object's type is {type(data_object)} and keys are {data_object.keys()}")
try:
user_data = data_object["user_data"]
user_ids = data_object["user_ids"]
except:
logger.info(f"Except block, data_object value: {data_object}")
logger.info("Get runs on data_object")
logger.info(f"{user_data.keys()}")
heart_rate_dictionary = {} # {userid: {date1: val, date}}
steps_dictionary = {}
nonseq_dictionary = {}
for user_id in user_ids:
logger.info(f"Going through user: {user_id}")
user_data_uri = user_data[user_id] # this gives me a dictionary
user_heart_rate_uri = user_data_uri["heart_rate"]
user_step_count_uri = user_data_uri["step_count"]
user_sleep_uri = user_data_uri["sleep"]
user_demographic_uri = user_data_uri["demographic"]
logger.info("Called all user_data_uri functions")
deserialized_heart_data = read_latest_file(user_heart_rate_uri)
logger.info("read_latest_file on heart rate called")
deserialized_step_data = read_latest_file(user_step_count_uri)
logger.info("read_latest_file on step data called")
deserialized_sleep_data = read_latest_file(user_sleep_uri)
logger.info("read_latest_file on sleep data called")
deserialized_demographic_data = read_demographic(user_demographic_uri)
logger.info("read_demographic called")
logger.info("Called all read file functions")
这是我的 read_latest_file() 函数:
def read_latest_file(folder_uri):
logger.info("read_latest_file entered")
s3_client = boto3.client("s3")
logger.info("s3_client initialized")
bucket_name = "nashs3bucket15927-dev"
response = s3_client.list_objects_v2(Bucket=bucket_name, Prefix=folder_uri)
logger.info("list_objects_v2 called")
latest_file = max(response.get('Contents', []), key=lambda x: x['LastModified']) if 'Contents' in response else None
logger.info("Latest file found")
if latest_file:
logger.info("latest file not empty")
# print("working")
file_key = latest_file['Key']
logger.info(f"file key: {file_key}")
# Read the JSON file content from S3
response = s3_client.get_object(Bucket=bucket_name, Key=file_key)
logger.info("get_object called, response received")
file_content = json.loads(response['Body'].read().decode('utf-8'))
logger.info("file decoded and deserialized, file_content received")
logger.info(f"length of file_content: {len(file_content)}")
return file_content
else:
logger.info("latest file empty")
return None
正在打印日志
"read_latest_file entered"
。但是,"s3_client initialized"
并未被打印。更多 CloudWatch 日志表明调用 boto3.client("s3") 给了我错误。
有关 boto3.client("s3") 的相关 CloudWatch 日志
1596 2024-01-08 19:28:38,664 INFO read_latest_file entered
1596 2024-01-08 19:28:38,680 ERROR exception handling request: maximum recursion depth exceeded
Traceback (most recent call last):
File "/sagemaker/python_service.py", line 373, in _handle_invocation_post
res.body, res.content_type = handlers(data, context)
File "/sagemaker/python_service.py", line 405, in handler
processed_input = custom_input_handler(data, context)
File "/opt/ml/model/code/inference.py", line 65, in input_handler
deserialized_heart_data = read_latest_file(user_heart_rate_uri)
File "/opt/ml/model/code/inference.py", line 133, in read_latest_file
s3_client = boto3.client("s3")
File "/usr/local/lib/python3.9/site-packages/boto3/__init__.py", line 92, in client
return _get_default_session().client(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/boto3/session.py", line 299, in client
return self._session.create_client(
File "/usr/local/lib/python3.9/site-packages/botocore/session.py", line 951, in create_client
credentials = self.get_credentials()
File "/usr/local/lib/python3.9/site-packages/botocore/session.py", line 507, in get_credentials
self._credentials = self._components.get_component(
File "/usr/local/lib/python3.9/site-packages/botocore/session.py", line 1108, in get_component
self._components[name] = factory()
File "/usr/local/lib/python3.9/site-packages/botocore/session.py", line 186, in _create_credential_resolver
return botocore.credentials.create_credential_resolver(
File "/usr/local/lib/python3.9/site-packages/botocore/credentials.py", line 92, in create_credential_resolver
container_provider = ContainerProvider()
File "/usr/local/lib/python3.9/site-packages/botocore/credentials.py", line 1894, in __init__
fetcher = ContainerMetadataFetcher()
File "/usr/local/lib/python3.9/site-packages/botocore/utils.py", line 2846, in __init__
session = botocore.httpsession.URLLib3Session(
File "/usr/local/lib/python3.9/site-packages/botocore/httpsession.py", line 313, in __init__
self._manager = PoolManager(**self._get_pool_manager_kwargs())
File "/usr/local/lib/python3.9/site-packages/botocore/httpsession.py", line 331, in _get_pool_manager_kwargs
'ssl_context': self._get_ssl_context(),
File "/usr/local/lib/python3.9/site-packages/botocore/httpsession.py", line 340, in _get_ssl_context
return create_urllib3_context()
File "/usr/local/lib/python3.9/site-packages/botocore/httpsession.py", line 129, in create_urllib3_context
context.options |= options
File "/usr/local/lib/python3.9/ssl.py", line 602, in options
super(SSLContext, SSLContext).options.__set__(self, value)
File "/usr/local/lib/python3.9/ssl.py", line 602, in options
super(SSLContext, SSLContext).options.__set__(self, value)
File "/usr/local/lib/python3.9/ssl.py", line 602, in options
super(SSLContext, SSLContext).options.__set__(self, value)
[Previous line repeated 476 more times]
我希望我的 s3 客户端得到初始化,以便我可以从端点内访问 s3 资源来执行推理。
我尝试过的事情:
将
s3_client = boto3.client("s3")
移至 input_handler 外部,并紧接在 inference.py 中的所有导入之后。这继续给我带来同样的 SSL 问题
检查我的 SageMaker 端点拥有的角色的权限。该角色拥有完整的 SageMaker 访问权限,其中包括对 s3 资源的访问权限。从 SageMaker Studio 笔记本创建端点时,我使用 get_execution_role() 来获取角色,并将其作为角色参数传递
检查端点是否是在VPC中创建的。当我在 AWS 控制台上转到“端点”下的 VPC 时,它显示没有端点。
咨询 GPT-4。 GPT-4 认为这是一个低级网络问题,我应该联系 AWS。但我对此表示怀疑,这就是 GPT 对我过去遇到的另一个问题的看法,但它并不是那么低级和困难。
最后我的问题是, 为什么在我的 SageMaker 端点的推理脚本中调用 boto3.client("s3") 会出现最大递归深度错误,这似乎源于某些 SSL 问题
此错误可能是由直接或间接(由另一个库)使用的
gevent
引起的。
如果是
gevent
,解决办法就是添加
import gevent.monkey
gevent.monkey.patch_all()
在导入任何
boto3
/requests
之前。
相关问题/链接: