尽管成功创建 parquet 文件,aws lambda 函数仍超时

问题描述 投票:0回答:1

我正在使用 Python 开发 AWS Lambda 函数,该函数只需将 JSON 文件转换为 Parquet。我设置了 5 分钟的超时限制并调用了该函数。虽然Parquet文件创建成功,但该函数仍然超时。当我调试它时,一切看起来都很好,直到 to_parquet 行,程序被卡住了。它在我的本地环境中完美运行。有人对如何解决这个问题有任何建议吗?

Python 函数

import awswrangler as wr
import pandas as pd
import urllib.parse
import os

# Temporary hard-coded AWS Settings; i.e. to be set as OS variable in Lambda
os_input_s3_cleansed_layer = os.environ['s3_cleansed_layer']
os_input_glue_catalog_db_name = os.environ['glue_catalog_db_name']
os_input_glue_catalog_table_name = os.environ[ 'glue_catalog_table_name']
os_input_write_data_operation = os.environ['write_data_operation']

def lambda_handler (event, context):
    print('## EVENT')
    print(event)
    # Get the object from the event and show its content type
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
    try:
        # Creating DF from content
        df_raw = wr.s3.read_json ('s3://{}/{}'.format(bucket, key))
        print('## DF RAW')
        print(df_raw.shape)
        # Extract required columns:
        df_step_1 = pd.json_normalize(df_raw['items'])
        print('## DF CLEANED')
        print(df_step_1.shape)
        # Write to S3
        wr_response = wr.s3.to_parquet(
            df=df_step_1,
            path=os_input_s3_cleansed_layer,
            dataset=True,
            database=os_input_glue_catalog_db_name,
            table=os_input_glue_catalog_table_name,
            mode=os_input_write_data_operation
        )
        print('## RESPONSE')
        return wr_response
    except Exception as e:
        print(e)
        print('Error getting object {} from bucket {}.Make sure they exist and your bucket is in the same region as this function.'.format (key, bucket))
        raise e

AWS Lambda 输出

Test Event Name
s3-put

Response
{
  "errorMessage": "2024-07-31T07:11:36.885Z 7a24ab3f-13c8-4284-980f-6b705d64273a Task timed out after 307.11 seconds"
}

Function Logs
START RequestId: 7a24ab3f-13c8-4284-980f-6b705d64273a Version: $LATEST
## EVENT
{'Records': [{'eventVersion': '2.0', 'eventSource': 'aws:s3', 'awsRegion': 'us-east-1', 'eventTime': '1970-01-01T00:00:00.000Z', 'eventName': 'ObjectCreated:Put', 'userIdentity': {'principalId': 'EXAMPLE'}, 'requestParameters': {'sourceIPAddress': '127.0.0.1'}, 'responseElements': {'x-amz-request-id': 'EXAMPLE123456789', 'x-amz-id-2': 'EXAMPLE123/5678abcdefghijklambdaisawesome/mnopqrstuvwxyzABCDEFGH'}, 's3': {'s3SchemaVersion': '1.0', 'configurationId': 'testConfigRule', 'bucket': {'name': 'youtubepipeline-raw-useast1-dev', 'ownerIdentity': {'principalId': 'EXAMPLE'}, 'arn': 'arn:aws:s3:::youtubepipeline-raw-useast1-dev'}, 'object': {'key': 'youtube/raw_statistics_reference_data/CA_category_id.json', 'size': 1024, 'eTag': '0123456789abcdef0123456789abcdef', 'sequencer': '0A1B2C3D4E5F678901'}}}]}
## DF RAW
(31, 3)
## DF CLEANED
(31, 6)
2024-07-31T07:11:36.885Z 7a24ab3f-13c8-4284-980f-6b705d64273a Task timed out after 307.11 seconds

END RequestId: 7a24ab3f-13c8-4284-980f-6b705d64273a
REPORT RequestId: 7a24ab3f-13c8-4284-980f-6b705d64273a  Duration: 307106.33 ms  Billed Duration: 300000 ms  Memory Size: 128 MB Max Memory Used: 128 MB Init Duration: 4043.97 ms

Request ID
7a24ab3f-13c8-4284-980f-6b705d64273a

我正在寻求解决此问题的建议。

python amazon-s3 aws-lambda
1个回答
0
投票

您可能需要增加内存大小,因为分配的内存与输出中所示的最大已用内存相同:

Memory Size: 128 MB Max Memory Used: 128 MB
© www.soinside.com 2019 - 2024. All rights reserved.