zlib.error：解压时错误-3：标头检查不正确

Question

我有一个 gzip 文件，我正在尝试通过 Python 读取它，如下所示：

import zlib

do = zlib.decompressobj(16+zlib.MAX_WBITS)
fh = open('abc.gz', 'rb')
cdata = fh.read()
fh.close()
data = do.decompress(cdata)

它抛出这个错误：

zlib.error: Error -3 while decompressing: incorrect header check

如何克服？

Answer 1

您遇到此错误：

zlib.error: Error -3 while decompressing: incorrect header check

这很可能是因为您正在尝试检查不存在的标头，例如您的数据遵循

RFC 1951

（

deflate

压缩格式）而不是

RFC 1950

（

zlib

压缩格式）或

RFC 1952

（

gzip

压缩格式）。

选择windowBits

但是

zlib

可以解压缩所有这些格式：

要（解）压缩
```
deflate
```
格式，请使用
```
wbits = -zlib.MAX_WBITS
```
要（解）压缩
```
zlib
```
格式，请使用
```
wbits = zlib.MAX_WBITS
```
要（解）压缩
```
gzip
```
格式，请使用
```
wbits = zlib.MAX_WBITS | 16
```

请参阅 http://www.zlib.net/manual.html#Advanced（第

inflateInit2

部分）

中的文档

示例

测试数据：

>>> deflate_compress = zlib.compressobj(9, zlib.DEFLATED, -zlib.MAX_WBITS)
>>> zlib_compress = zlib.compressobj(9, zlib.DEFLATED, zlib.MAX_WBITS)
>>> gzip_compress = zlib.compressobj(9, zlib.DEFLATED, zlib.MAX_WBITS | 16)
>>> 
>>> text = '''test'''
>>> deflate_data = deflate_compress.compress(text) + deflate_compress.flush()
>>> zlib_data = zlib_compress.compress(text) + zlib_compress.flush()
>>> gzip_data = gzip_compress.compress(text) + gzip_compress.flush()
>>>

明显测试

zlib

：

>>> zlib.decompress(zlib_data)
'test'

测试

deflate

：

>>> zlib.decompress(deflate_data)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
zlib.error: Error -3 while decompressing data: incorrect header check
>>> zlib.decompress(deflate_data, -zlib.MAX_WBITS)
'test'

测试

gzip

：

>>> zlib.decompress(gzip_data)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
zlib.error: Error -3 while decompressing data: incorrect header check
>>> zlib.decompress(gzip_data, zlib.MAX_WBITS|16)
'test'

数据也兼容

gzip

模块：

>>> import gzip
>>> import StringIO
>>> fio = StringIO.StringIO(gzip_data)  # io.BytesIO for Python 3
>>> f = gzip.GzipFile(fileobj=fio)
>>> f.read()
'test'
>>> f.close()

自动标头检测（zlib 或 gzip）

将

添加到

windowBits

将触发标头检测

>>> zlib.decompress(gzip_data, zlib.MAX_WBITS|32)
'test'
>>> zlib.decompress(zlib_data, zlib.MAX_WBITS|32)
'test'

使用

gzip

代替

或者你可以忽略

zlib

并直接使用

gzip

模块；但请记住，在幕后，

gzip

使用

zlib

。

fh = gzip.open('abc.gz', 'rb')
cdata = fh.read()
fh.close()

Answer 2

更新：dnozay的答案解释了问题，应该是公认的答案。

尝试使用

gzip

模块，下面的代码直接来自 python 文档。

import gzip
f = gzip.open('/home/joe/file.txt.gz', 'rb')
file_content = f.read()
f.close()

Answer 3

我刚刚解决了解压缩 gzip 数据时的“标头检查不正确”问题。

您需要在调用 inflateInit2 时设置 -WindowBits => WANT_GZIP （使用 2 版本）

是的，这可能会非常令人沮丧。对文档的典型浅读将 Zlib 视为 Gzip 压缩的 API，但默认情况下（不使用 gz* 方法）它不会创建或解压缩 Gzip 格式。您必须发送这个非非常显着的记录标志。

Answer 4

这并没有回答原来的问题，但它可能会帮助到这里的其他人。

zlib.error: Error -3 while decompressing: incorrect header check

也出现在下面的示例中：

b64_encoded_bytes = base64.b64encode(zlib.compress(b'abcde'))
encoded_bytes_representation = str(b64_encoded_bytes)  # this the cause
zlib.decompress(base64.b64decode(encoded_bytes_representation))

这个例子是我在一些旧版 Django 代码中遇到的情况的最小再现，其中 Base64 编码的字节（来自 HTTP POST）被存储在 Django

CharField

（而不是

BinaryField

）中.

从数据库读取

CharField

值时，会在该值上调用

str()

，而没有显式

encoding

，如 Django 源代码中所示。

str()

文档说：

如果既没有给出编码也没有给出错误，str(object)返回object.str()，这是对象的“非正式”或很好打印的字符串表示形式。对于字符串对象，这是字符串本身。如果 object 没有 str() 方法，则 str() 返回返回 repr(object)。

所以，在这个例子中，我们无意中进行了base64解码

"b'eJxLTEpOSQUABcgB8A=='"

而不是

b'eJxLTEpOSQUABcgB8A=='

.

如果使用显式

zlib

，则示例中的

encoding

解压缩将会成功，例如

str(b64_encoded_bytes, 'utf-8')

。

特定于 Django 的注释：

特别棘手的是：这个问题仅在从数据库中检索值时出现。例如，请参阅下面的测试，该测试通过了（在 Django 3.0.3 中）：

class MyModelTests(TestCase):
    def test_bytes(self):
        my_model = MyModel.objects.create(data=b'abcde')
        self.assertIsInstance(my_model.data, bytes)  # issue does not arise
        my_model.refresh_from_db()
        self.assertIsInstance(my_model.data, str)  # issue does arise

哪里

MyModel

是

class MyModel(models.Model):
    data = models.CharField(max_length=100)

Answer 5

要解压缩内存中不完整的 gzip 字节，dnozay 的答案很有用，但它错过了我发现必要的

zlib.decompressobj

调用：

incomplete_decompressed_content = zlib.decompressobj(wbits=zlib.MAX_WBITS | 16).decompress(incomplete_gzipped_content)

请注意，

zlib.MAX_WBITS | 16

是

15 | 16

，即 31。有关

wbits

的一些背景信息，请参阅

zlib.decompress

。

信用：Yann Vernier的回答，其中记录了

zlib.decompressobj

调用。

Answer 6

有趣的是，我在尝试使用 Python 使用 Stack Overflow API 时遇到了这个错误。

我设法让它与 gzip 目录中的

GzipFile

对象一起工作，大致如下：

import gzip

gzip_file = gzip.GzipFile(fileobj=open('abc.gz', 'rb'))

file_contents = gzip_file.read()

Answer 7

我的案例是解压缩存储在 Bullhorn 数据库中的电子邮件。片段如下：

import pyodbc
import zlib

cn = pyodbc.connect('connection string')
cursor = cn.cursor()
cursor.execute('SELECT TOP(1) userMessageID, commentsCompressed FROM BULLHORN1.BH_UserMessage WHERE DATALENGTH(commentsCompressed) > 0 ')



 for msg in cursor.fetchall():
    #magic in the second parameter, use negative value for deflate format
    decompressedMessageBody = zlib.decompress(bytes(msg.commentsCompressed), -zlib.MAX_WBITS)

Answer 8

如果您使用

Node.js

，请尝试

fflate

软件包，它对我有用

gzip

。

const fflate = require('fflate');


    const decompressedData = await new Promise((resolve, reject) => {
           fflate.gunzip(buffer, (error, result) => {
                       if (error) {
                       reject(error);
                       } else {
                       resolve(result);
                      }
                   });
                });
           xml = Buffer.from(decompressedData).toString('UTF-8');

Answer 9

只需添加标头 'Accept-Encoding': 'identity'

import requests

requests.get('http://gett.bike/', headers={'Accept-Encoding': 'identity'})

https://github.com/requests/requests/issues/3849

zlib.error：解压时错误-3：标头检查不正确

问题描述投票：0回答：9

9个回答

选择windowBits

示例

自动标头检测（zlib 或 gzip）

使用
`gzip`
代替

最新问题

zlib.error：解压时错误-3：标头检查不正确

问题描述 投票：0回答：9

9个回答

选择windowBits

示例

自动标头检测（zlib 或 gzip）

使用gzip代替

最新问题

问题描述投票：0回答：9

使用
`gzip`
代替