Django 4 异步 StreamingHTTPResponse 并发送生成的大型 zip 文件

Question

我正在尝试在 Django 中压缩一些大文件并将它们作为 zip 文件流式传输到客户端。问题是它倾向于先将 zip 文件创建到内存中，然后再发送。这会导致内存耗尽并且 Django 崩溃。由于存储空间有限，在发送之前将 zip 文件写入磁盘也是不可行的。

我能够使用 zipfly 包来做到这一点，并且当 Django 在 WSGI 中同步运行时它工作得很好。但后来我需要实现 WebSockets，并且必须让 Django 在 ASGI 中异步运行，流式 zipfly 实现就崩溃了。

所以我开始编写自己的实现，几乎可以正常工作，但存在问题。它当前正在尝试动态创建 zip，但仍然使用相当多的内存。另外，我的代码没有向客户端发送数据流已结束的指示，导致浏览器最终中止传输。

有人可以帮助我吗？这是代码：

async def create_zip_streaming_response(files_to_compress: list, filename: str):
# Create a generator function to yield chunks of data
async def generate():
    # Create an in-memory byte buffer to store the compressed data
    buffer = BytesIO()

    # Create a zip file object
    with zipfile.ZipFile(buffer, 'w', zipfile.ZIP_DEFLATED) as zip_file:
        # Iterate over the files and add them to the zip file
        for file_path in files_to_compress:
            # Get the file name from the file path
            file_name = os.path.basename(file_path)

            # Add the file to the zip file
            zip_file.write(file_path, file_name)

            # Flush the buffer to ensure data is written
            buffer.flush()

            # Move the buffer's read position to the beginning
            buffer.seek(0)

            # Read the data from the buffer
            data = buffer.read()

            # Yield the generated data in chunks
            yield data

            # Reset the buffer for the next file
            buffer.truncate(0)

    # Create a streaming response using the generator function
    response = StreamingHttpResponse(generate(), content_type='application/zip')
    response['Content-Disposition'] = f'attachment; filename="{filename}"'

    return response

我非常感谢您的帮助。这是一个我非常喜欢的爱好项目，但我不知道如何克服这个问题。该项目部署在 docker 中，并在生产中使用 nginx，与 uvicorn 结合使用。

Answer 1

我遇到了同样的问题，我通过添加最终的“刷新”并使用自己的缓冲区实现来修改您的实现，使其正常工作。在测试过程中我没有发现任何提到的问题。

class Buffer:
    def __init__(self):
        self.buf = bytearray()

    def write(self, data):
        self.buf.extend(data)
        return len(data)
    
    def flush(self):
        pass

    def take(self):
        buf = self.buf
        self.buf = bytearray()
        return bytes(buf)

    def end(self):
        buf = self.buf
        self.buf = None
        return bytes(buf)

def create_zip_streaming_response(*args):
    files_to_compress = ['a.txt', 'b.txt', 'c.csv']

    def generate():
        buffer = Buffer()

        # Create a zip file object
        with zipfile.ZipFile(buffer, 'w', zipfile.ZIP_DEFLATED) as zip_file:
            for file_path in files_to_compress:
                zip_file.writestr(file_path, "This is file " + file_path)

                yield buffer.take()

        yield buffer.end()

    response = StreamingHttpResponse(generate(), content_type='application/zip')
    response['Content-Disposition'] = f'attachment; filename="test.zip"'
    return response

Answer 2

python-zipstream帮助我解决了这个问题。顺便说一句，我不是这个项目的贡献者。

首先，我的看法是这样的：

from django.http import StreamingHttpResponse

return StreamingHttpResponse(
    generate_report(),
    headers={
        "Content-Type": 'application/zip',
        "Content-Disposition": 'attachment; filename="report.zip"',
        "Access-Control-Expose-Headers": 'Content-Disposition',
    },
)

这就是我的功能

generate_report()

：

from zipstream import ZipFile

def generate_report():
    files = ['/path/to/file1']

    zip_file = ZipFile()
    
    for file in files:

        archive_path = "..."
        file_content = open(file, 'rb').read()
        zip_file.writestr(archive_path, file_content)

        yield from zip_file.flush()

    yield from zip_file

它在 ASGI 中对我有用，对于同一示例，我的请求从大约 11 秒缩短到 2.5 秒。然而令人惊讶的是，我的 zip 文件从 212 MB 变成了 282 MB。我会调查一下。

Django 4 异步 StreamingHTTPResponse 并发送生成的大型 zip 文件

问题描述投票：0回答：2

2个回答

最新问题

Django 4 异步 StreamingHTTPResponse 并发送生成的大型 zip 文件

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2