我有高负载系统,许多用户可以上传大文件(+1GB)。 上传后,有时,我需要从S3下载它们来计算一些元信息。 目前我正在使用此代码来执行此操作(查看
fetch_file
):
def _get_file_extension(key) -> str
...
class S3Storage:
def __init__(self, *args, **kwargs):
...
self._config = botocore.config.Config(
read_timeout=read_timeout,
connect_timeout=connect_timeout,
retries={
"total_max_attempts": ...,
"max_attempts": ...,
},
signature_version="v4",
)
self._session = self._create_session()
def _create_session():
return aioboto3.Session()
def _create_client(self):
return self._session.client(
service_name="s3",
endpoint_url=self._endpoint_url,
region_name=self._region,
aws_access_key_id=self._access_key_id,
aws_secret_access_key=self._secret_access_key,
config=self._config,
)
async def fetch_file(self, key: str, bucket: str | None = None) -> str:
try:
async with self._create_client() as client:
response = await client.get_object(Bucket=bucket or self._bucket, Key=key)
async with aiofiles.tempfile.NamedTemporaryFile(
"wb",
suffix=_get_file_extension(key),
delete=False,
) as file:
async for chunk in response["Body"]:
await file.write(chunk)
return str(file.name)
except botocore.exceptions.ClientError as e:
...
except aiohttp.ServerTimeoutError as e:
...
except (
botocore.exceptions.BotoCoreError,
aiohttp.ClientError,
) as e:
...
由于某种原因,这种实现速度确实很慢。经过分析,我发现下载 1GB 文件需要 300 秒。所有 300 秒都完全按照
fetch_file
方法度过。
有人可以帮我弄清楚为什么从 s3 下载文件需要这么长时间以及我可以用它做什么。
我根据“fetch_file”方法改编了您的代码。将 try 块内的整个代码替换为以下代码:
async with self._create_client() as client:
async with aiofiles.tempfile.NamedTemporaryFile(
"wb",
suffix=_get_file_extension(key),
delete=False,
) as file:
response = await client.download_file(Bucket=bucket or self._bucket, Key=key, Filename=file)
return str(file.name)
我希望这会起作用。目前我无法测试这个!