使用aioboto3从s3下载大文件,aiofiles真的很慢

问题描述 投票:0回答:1

我有高负载系统,许多用户可以上传大文件(+1GB)。 上传后,有时,我需要从S3下载它们来计算一些元信息。 目前我正在使用此代码来执行此操作(查看

fetch_file
):

def _get_file_extension(key) -> str
    ...

class S3Storage:
    def __init__(self, *args, **kwargs):
        ...
        self._config = botocore.config.Config(
            read_timeout=read_timeout,
            connect_timeout=connect_timeout,
            retries={
                "total_max_attempts": ...,
                "max_attempts": ...,
            },
            signature_version="v4",
        )
        self._session = self._create_session()

    def _create_session():
        return aioboto3.Session()

    def _create_client(self):
        return self._session.client(
            service_name="s3",
            endpoint_url=self._endpoint_url,
            region_name=self._region,
            aws_access_key_id=self._access_key_id,
            aws_secret_access_key=self._secret_access_key,
            config=self._config,
        )

    async def fetch_file(self, key: str, bucket: str | None = None) -> str:
        try:
            async with self._create_client() as client:
                response = await client.get_object(Bucket=bucket or self._bucket, Key=key)
                async with aiofiles.tempfile.NamedTemporaryFile(
                    "wb",
                    suffix=_get_file_extension(key),
                    delete=False,
                ) as file:
                    async for chunk in response["Body"]:
                        await file.write(chunk)
                return str(file.name)
        except botocore.exceptions.ClientError as e:
            ...
        except aiohttp.ServerTimeoutError as e:
            ...
        except (
            botocore.exceptions.BotoCoreError,
            aiohttp.ClientError,
        ) as e:
            ...

由于某种原因,这种实现速度确实很慢。经过分析,我发现下载 1GB 文件需要 300 秒。所有 300 秒都完全按照

fetch_file
方法度过。

有人可以帮我弄清楚为什么从 s3 下载文件需要这么长时间以及我可以用它做什么。

python python-3.x amazon-s3 aiobotocore
1个回答
0
投票

我根据“fetch_file”方法改编了您的代码。将 try 块内的整个代码替换为以下代码:

async with self._create_client() as client:
    async with aiofiles.tempfile.NamedTemporaryFile(
        "wb",
        suffix=_get_file_extension(key),
        delete=False,
    ) as file:
        response = await client.download_file(Bucket=bucket or self._bucket, Key=key, Filename=file)
    return str(file.name)

我希望这会起作用。目前我无法测试这个!

© www.soinside.com 2019 - 2024. All rights reserved.