通过猴子修补DEFAULT_PROTOCOL提高了pickle.dumps的性能?

问题描述 投票:2回答:1

我注意到它可以在速度方面产生很大的差异,如果你通过参数指定pickle.dumps中使用的协议,或者如果你为所需的协议版本修改pickle.DEFAULT_PROTOCOL。

在Python 3.6上,pickle.DEFAULT_PROTOCOL为3,pickle.HIGHEST_PROTOCOL为4。

对于达到一定长度的对象,将DEFAULT_PROTOCOL设置为4似乎更快,而不是将protocol=4作为参数传递。

例如,在我的测试中,将pickle.DEFAULT_PROTOCOL设置为4并通过调用pickle.dumps(packet_list_1)来检索长度为1的列表需要481 ns,而使用pickle.dumps(packet_list_1, protocol=4)调用则需要733 ns,对于明确传递协议而言是惊人的〜52%速度惩罚而不是下降回到默认值(之前设置为4)。

  """
  (stackoverflow insists this to be formatted as code:)

  pickle.DEFAULT_PROTOCOL = 4
  pickle.dumps(packet) vs pickle.dumps(packet, protocol=4):

  (stackoverflow insists this to be formatted as code:)
  For a list with length 1 it's 481ns vs 733ns (~52% penalty).
  For a list with length 10 it's 763ns vs 999ns (~30% penalty).
  For a list with length 100 it's 2.99 µs vs 3.21 µs (~7% penalty).
  For a list with length 1000 it's 25.8 µs vs 26.2 µs (~1.5% penalty).
  For a list with length 1_000_000 it's 32 ms vs 32.4 ms (~1.13% penalty).
  """

我发现了实例,列表,字符串和数组的这种行为,这是我迄今为止所测试的。物体尺寸会减弱效果。

对于dicts,我注意到效果在某一点转向相反,因此对于长度10 ** 6 dict(具有唯一整数值​​),显式传递protocol = 4作为参数(269ms)比依赖于默认设置为4更快(286ms)。

 """
 pickle.DEFAULT_PROTOCOL = 4 
 pickle.dumps(packet) vs pickle.dumps(packet, protocol=4):

 For a dict with length 1 it's 589 ns vs 811 ns (~38% penalty).
 For a dict with length 10 it's 1.59 µs vs 1.81 µs (~14% penalty).
 For a dict with length 100 it's 13.2 µs vs 12.9 µs (~2,3% penalty).
 For a dict with length 1000 it's 128 µs vs 129 µs (~0.8% penalty).
 For a dict with length 1_000_000 it's 306 ms vs 283 ms (~7.5% improvement).
 """

瞥见泡菜来源,没有任何东西可以引起我的注意,可能导致这种变化。

这种意想不到的行为如何解释?

有没有注意事项设置pickle.DEFAULT_PROTOCOL而不是传递协议作为参数,以利用提高的速度?

(与Python 3.6.3,IPython 6.2.1,Windows 7上的IPython时间魔术同步)

一些示例代码转储:

# instances -------------------------------------------------------------
class Dummy: pass

dummy = Dummy()

pickle.DEFAULT_PROTOCOL = 3

"""
>>> %timeit pickle.dumps(dummy)
5.8 µs ± 33.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>> %timeit pickle.dumps(dummy, protocol=4)
6.18 µs ± 10.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
"""
pickle.DEFAULT_PROTOCOL = 4
"""
%timeit pickle.dumps(dummy)
5.74 µs ± 18.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit pickle.dumps(dummy, protocol=4)
6.24 µs ± 26.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
"""

# lists -------------------------------------------------------------
packet_list_1 = [*range(1)]

pickle.DEFAULT_PROTOCOL = 3
"""
>>>%timeit pickle.dumps(packet_list_1)
476 ns ± 1.01 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>>%timeit pickle.dumps(packet_list_1, protocol=4)
730 ns ± 2.22 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
"""
pickle.DEFAULT_PROTOCOL = 4
"""
>>>%timeit pickle.dumps(packet_list_1)
481 ns ± 2.12 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>>%timeit pickle.dumps(packet_list_1, protocol=4)
733 ns ± 2.94 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
"""
# --------------------------
packet_list_10 = [*range(10)]

pickle.DEFAULT_PROTOCOL = 3

"""
>>>%timeit pickle.dumps(packet_list_10)
714 ns ± 3.05 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>>%timeit pickle.dumps(packet_list_10, protocol=4)
978 ns ± 24.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
"""
pickle.DEFAULT_PROTOCOL = 4
"""
>>>%timeit pickle.dumps(packet_list_10)
763 ns ± 3.16 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>>%timeit pickle.dumps(packet_list_10, protocol=4)
999 ns ± 8.34 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
"""
# --------------------------
packet_list_100 = [*range(100)]

pickle.DEFAULT_PROTOCOL = 3

"""
>>>%timeit pickle.dumps(packet_list_100)
2.96 µs ± 5.16 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>>%timeit pickle.dumps(packet_list_100, protocol=4)
3.22 µs ± 18.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
"""
pickle.DEFAULT_PROTOCOL = 4
"""
>>>%timeit pickle.dumps(packet_list_100)
2.99 µs ± 18.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>>%timeit pickle.dumps(packet_list_100, protocol=4)
3.21 µs ± 9.11 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
"""
# --------------------------
packet_list_1000 = [*range(1000)]

pickle.DEFAULT_PROTOCOL = 3

"""
>>>%timeit pickle.dumps(packet_list_1000)
26 µs ± 105 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>>%timeit pickle.dumps(packet_list_1000, protocol=4)
26.4 µs ± 93.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
"""
pickle.DEFAULT_PROTOCOL = 4
"""
>>>%timeit pickle.dumps(packet_list_1000)
25.8 µs ± 110 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>>%timeit pickle.dumps(packet_list_1000, protocol=4)
26.2 µs ± 101 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
"""
# --------------------------
packet_list_1m = [*range(10**6)]

pickle.DEFAULT_PROTOCOL = 3

"""
>>>%timeit pickle.dumps(packet_list_1m)
32 ms ± 119 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>>%timeit pickle.dumps(packet_list_1m, protocol=4)
32.3 ms ± 141 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
"""
pickle.DEFAULT_PROTOCOL = 4
"""
>>>%timeit pickle.dumps(packet_list_1m)
32 ms ± 52.7 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>>%timeit pickle.dumps(packet_list_1m, protocol=4)
32.4 ms ± 466 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
"""
python python-3.x pickle
1个回答
2
投票

让我们通过返回值重新组织您的%timeit结果:

| DEFAULT_PROTOCOL | call                                    | %timeit           | returns                                                                                                                      |
|------------------+-----------------------------------------+-------------------+------------------------------------------------------------------------------------------------------------------------------|
|                3 | pickle.dumps(dummy)                     | 5.8 µs ± 33.5 ns  | b'\x80\x03c__main__\nDummy\nq\x00)\x81q\x01.'                                                                                |
|                4 | pickle.dumps(dummy)                     | 5.74 µs ± 18.8 ns | b'\x80\x03c__main__\nDummy\nq\x00)\x81q\x01.'                                                                                |
|                3 | pickle.dumps(dummy, protocol=4)         | 6.18 µs ± 10.4 ns | b'\x80\x04\x95\x1b\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main__\x94\x8c\x05Dummy\x94\x93\x94)}\x94\x92\x94.'                  |
|                4 | pickle.dumps(dummy, protocol=4)         | 6.24 µs ± 26.7 ns | b'\x80\x04\x95\x1b\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main__\x94\x8c\x05Dummy\x94\x93\x94)}\x94\x92\x94.'                  |
|                3 | pickle.dumps(packet_list_1)             | 476 ns ± 1.01 ns  | b'\x80\x03]q\x00cbuiltins\nrange\nq\x01K\x00K\x01K\x01\x87q\x02Rq\x03a.'                                                     |
|                4 | pickle.dumps(packet_list_1)             | 481 ns ± 2.12 ns  | b'\x80\x03]q\x00cbuiltins\nrange\nq\x01K\x00K\x01K\x01\x87q\x02Rq\x03a.'                                                     |
|                3 | pickle.dumps(packet_list_1, protocol=4) | 730 ns ± 2.22 ns  | b'\x80\x04\x95#\x00\x00\x00\x00\x00\x00\x00]\x94\x8c\x08builtins\x94\x8c\x05range\x94\x93\x94K\x00K\x01K\x01\x87\x94R\x94a.' |
|                4 | pickle.dumps(packet_list_1, protocol=4) | 733 ns ± 2.94 ns  | b'\x80\x04\x95#\x00\x00\x00\x00\x00\x00\x00]\x94\x8c\x08builtins\x94\x8c\x05range\x94\x93\x94K\x00K\x01K\x01\x87\x94R\x94a.' |

注意当我们将具有相同返回值的调用配对时,%timeit结果如何很好地对应。

如您所见,pickle.DEFAULT_PROTOCOL的值对pickle.dumps返回的值没有影响。如果未指定protocol参数,则无论pickle.DEFAULT_PROTOCOL的值是什么,默认协议都是3。

reason is here

# Use the faster _pickle if possible
try:
    from _pickle import (
        PickleError,
        PicklingError,
        UnpicklingError,
        Pickler,
        Unpickler,
        dump,
        dumps,
        load,
        loads
    )
except ImportError:
    Pickler, Unpickler = _Pickler, _Unpickler
    dump, dumps, load, loads = _dump, _dumps, _load, _loads

pickle模块将pickle.dumps设置为_pickle.dumps,如果它成功导入_pickle,这是pickle模块的编译版本。 _pickle模块默认使用protocol=3。只有当Python无法导入_pickle时才将dumps设置为the Python version

def _dumps(obj, protocol=None, *, fix_imports=True):
    f = io.BytesIO()
    _Pickler(f, protocol, fix_imports=fix_imports).dump(obj)
    res = f.getvalue()
    assert isinstance(res, bytes_types)
    return res

只有Python版本_dumpspickle.DEFAULT_PROTOCOL值的影响:

In [68]: pickle.DEFAULT_PROTOCOL = 3

In [70]: pickle._dumps(dummy)
Out[70]: b'\x80\x03c__main__\nDummy\nq\x00)\x81q\x01.'

In [71]: pickle.DEFAULT_PROTOCOL = 4

In [72]: pickle._dumps(dummy)
Out[72]: b'\x80\x04\x95\x1b\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main__\x94\x8c\x05Dummy\x94\x93\x94)}\x94\x92\x94.'
© www.soinside.com 2019 - 2024. All rights reserved.