打开不支持的压缩类型的zip文件会默默地返回空文件流，而不是抛出异常

Question

似乎新手错误让我大吃一惊，而且我不是新手。我有一个 1.2G 已知良好的 zip 文件 'train.zip'，其中包含 3.5G 文件 'train.csv'。我打开 zip 文件并文件本身，没有任何异常（没有LargeZipFile），但生成的文件流似乎是空的。（UNIX 'unzip -c ...' 确认它是好的） Python

ZipFile.open()

返回的文件对象不可查找或可讲述，所以我无法检查。

Python 发行版是 2.7.3 EPD-free 7.3-1（32 位） ；但对于大拉链应该没问题。操作系统是MacOS 10.6.6

import csv
import zipfile as zf

zip_pathname = os.path.join('/my/data/path/.../', 'train.zip')
#with zf.ZipFile(zip_pathname).open('train.csv') as z:
z = zf.ZipFile(zip_pathname, 'r', zf.ZIP_DEFLATED, allowZip64=True) # I tried all permutations
z.debug = 1
z.testzip() # zipfile integrity is ok

z1 = z.open('train.csv', 'r') # our file keeps coming up empty?

# Check the info to confirm z1 is indeed a valid 3.5Gb file...
z1i = z.getinfo(file_name)
for att in ('filename', 'file_size', 'compress_size', 'compress_type', 'date_time',  'CRC', 'comment'):
    print '%s:\t' % att, getattr(z1i,att)
# ... and it looks ok. compress_type = 9 ok?
#filename:  train.csv
#file_size: 3729150126
#compress_size: 1284613649
#compress_type: 9
#date_time: (2012, 8, 20, 15, 30, 4)
#CRC:   1679210291

# All attempts to read z1 come up empty?!
# z1.readline() gives ''
# z1.readlines() gives []
# z1.read() takes ~60sec but also returns '' ?

# code I would want to run is:
reader = csv.reader(z1)
header = reader.next()
return reader

Answer 1

原因是以下因素的组合：

此文件的压缩类型是 类型 9：Deflate64/Enhanced Deflate（PKWare 的专有格式，而不是更常见的类型 8）
和一个zipfile错误：它不会抛出不支持的压缩类型的异常。它过去只是“默默地返回一个错误的文件对象”[第 4.4.5 节压缩方法]。啊啊。多么虚假啊。更新：我提交了 bug 14313，它已于 2012 年修复，因此现在当压缩类型未知时会引发 NotImplementedError。

解决方法

是解压缩，然后重新压缩，以获得简单的类型 8: Deflated。

zipfile 将在 2.7 、 3.2+

中抛出异常，我猜出于法律原因，zipfile 将永远无法实际处理类型 9。 Python 文档没有提到 zipfile 不能处理其他压缩类型:(

Answer 2

from zipfile import ZipFile import subprocess, sys def Unzip(zipFile, destinationDirectory): try: with ZipFile(zipFile, 'r') as zipObj: # Extract all the contents of zip file in different directory zipObj.extractall(destinationDirectory) except: print("An exception occurred extracting with Python ZipFile library.") print("Attempting to extract using 7zip") subprocess.Popen(["7z", "e", f"{zipFile}", f"-o{destinationDirectory}", "-y"])

Answer 3

它仍处于“alpha”阶段。我昨天（2022-07-18）才开始使用它，它对我来说很有效。

它非常易于使用，因为导入它后，您就可以像平常一样使用 zipfile 库，并添加了对 Deflate64 的支持。

链接到 pypi 上的“zipfile-deflate64”包

链接到 GitHub 上的“zipfile-deflate64”项目

这是如何使用它的示例。 API 与内置 zipfile 包相同：

import zipfile_deflate64 as zipfile tag_hist_path = "path\\to\\your\\zipfile.zip" parentZip = zipfile.ZipFile(tag_hist_path, mode="r", compression=zipfile.ZIP_DEFLATED64) fileNames = [f.filename for f in parentZip.filelist] memberArchive = parentZip.open(fileNames[0], mode="r") b = memberArchive.read() #reading all bytes at once, assuming file isn't too big txt = b.decode("utf-8") #decode bytes to text string memberArchive.close() parentZip.close()

根据 @smci 的建议，这里有一种更简洁、更清晰的方式来处理此类存档，因此您不必在出现错误时花费精力来管理流资源（即关闭它们）：

tag_hist_path = "path\\to\\your\\zipfile.zip" with zipfile.ZipFile(tag_hist_path, mode="r", compression=zipfile.ZIP_DEFLATED64) as parentZip: for fileNames in parentZip.filelist: with parentZip.open(fileNames[0], mode="r") as memberArchive: #Do something with each opened zipfile

Answer 4

因为 zlib 不支持 zipfile 委托的 Deflate64

）。如果较小的文件工作正常，我怀疑此 zip 文件是由 Windows 资源管理器创建的：

对于较大的文件，Windows 资源管理器可以决定使用 Deflate64

。（请注意，Zip64 与 Deflate64 不同。Zip64 由 Python 的 zipfile 模块支持，只是对一些元数据在 zipfile 中的存储方式进行了一些更改，但仍然使用常规 Deflate 来压缩数据。）

但是，

stream-unzip

现在支持 Deflate64。修改其示例以从本地磁盘读取，并使用 to-file-obj、TextIOWrapper 和 csv.reader 读取 CSV 文件，如示例所示： import csv from io import TextIOWrapper import os from stream_unzip import stream_unzip from to_file_like_obj import to_file_like_obj def get_zipped_chunks(zip_pathname): with open(zip_pathname, 'rb') as f: yield from iter(lambda: f.read(65536), b'') def get_unzipped_chunks(zipped_chunks, filename) for file_name, file_size, unzipped_chunks in stream_unzip(zipped_chunks): if file_name != filename: for chunk in unzipped_chunks: pass continue yield from unzipped_chunks zipped_chunks = get_zipped_chunks(os.path.join('/my/data/path/.../', 'train.zip')) unzipped_chunks = get_unzipped_chunks(zipped_chunks, b'train.csv') str_lines = TextIOWrapper(to_file_like_obj(unzipped_chunks, encoding='utf-8', newline='')) csv_reader = csv.reader(str_lines) for row in csv_reader: print(row)

打开不支持的压缩类型的zip文件会默默地返回空文件流，而不是抛出异常

问题描述投票：0回答：4

4个回答

最新问题

打开不支持的压缩类型的zip文件会默默地返回空文件流，而不是抛出异常

问题描述 投票：0回答：4

4个回答

最新问题

问题描述投票：0回答：4