无法使用 pickle 序列化使用 make_dataclass 生成的数据类

Question

在我的代码中，我使用 make_dataclass 函数生成运行时类。

问题是此类数据类无法使用 pickle 进行序列化，如下面的代码片段所示。

重要的是要知道，对于我的应用程序来说，这些数据类实例必须是可选择的，因为它们需要传输到

multiprocessing

执行器池。

from dataclasses import dataclass, make_dataclass, asdict
import pickle

#
# Standard data class generated using the decorator approach
@dataclass
class StdDataClass:
    num: int = 0


a = StdDataClass(12)
ser_a = pickle.dumps(a)
des_a = pickle.loads(ser_a)
# serialization and deserialization with pickle is working 
assert a.num == des_a.num


# Run time created class using the make_dataclass approach.
# In the real case, the name, the type and the default value of each field is 
# not available before the code is executed. The structure of this dataclass is
# known only at run-time.
fields = [('num', int, 0)]
B = make_dataclass('B', fields)
b = B(2)
try:
    # An attempt to serialize the object is triggering an exception
    # Can't pickle <class 'types.B'>: attribute lookup B on types failed
    ser_b = pickle.dumps(b)
    des_b = pickle.loads(ser_b)
    assert b.num == des_b.num
except pickle.PickleError as e :
    print(e)

序列化使用 make_dataclass 方法定义的类会触发异常。我认为这实际上是可以预料的，因为在文档中写道：

可以腌制以下类型：
内置常量（None、True、False、Ellipsis 和 NotImplemented）；

整数、浮点数、复数；

字符串、字节、字节数组；

仅包含可pickl对象的元组、列表、集合和字典；

可从模块顶层访问函数（内置和用户定义）（使用 def，而不是 lambda）；

可从模块顶层访问的类；
此类类的实例，其调用
__getstate__()
的结果是可选取的（有关详细信息，请参阅选取类实例部分）。

我认为问题在于模块中没有 B 类的定义（粗线），这就是它失败的原因，但我对此不确定。

我发现的解决方法是将运行时创建的数据类转换为字典，序列化字典，并在需要时反序列化字典以重新创建数据类。

# A base data class with no data members but with a class method 'constructor' 
# and a convenience method to convert the class into a dictionary
@dataclass
class BaseDataClass:

    @classmethod
    def from_dict(cls, d: dict):
        new_instance = cls()
        for key in d:
            setattr(new_instance, key, d[key])
        return new_instance

    def to_dict(self):
        return asdict(self)

# Another baseclass defined with the make_dataclass approach but 
# using BaseDataClass as base. 
C = make_dataclass('C', fields, bases=(BaseDataClass,))
c = C(13)

# WORKAROUND
# 
# Instead of serializing the class object, I am pickling the 
# corresponding dictionary
ser_c = pickle.dumps(c.to_dict())

# Deserialize the dictionary and use it to recreate the dataclass
des_c = C.from_dict(pickle.loads(ser_c))

assert c.num == des_c.num

尽管解决方法实际上有效，但我想知道是否不可能教 pickle 对从 BaseDataClass 派生的任何数据类执行相同的操作。

我尝试编写

__reduce__

方法以及

__setstate__

和

__getstate__

代码，但没有成功。

我虽然对 Pickle 进行子类化以拥有自定义化简器，但如果您无法修改要序列化的对象的类（例如由外部库生成），那么这是推荐的方法，而且我也不知道如何指定

multiprocessing

模块使用我的 pickle 子类而不是基类。

您知道我该如何解决这个问题吗？

Answer 1

您可以将使用

make_dataclass

创建的类设为全局，如下所示：

from dataclasses import make_dataclass
import pickle

fields = [('num', int, 0)]
B = make_dataclass('B', fields)
b = B(2)
globals()['B'] = B  # Make class B global
ser_b = pickle.dumps(b)
des_b = pickle.loads(ser_b)
print(des_b.num, des_b == b)

打印：

2 True

无法使用 pickle 序列化使用 make_dataclass 生成的数据类

问题描述投票：0回答：1

1个回答

最新问题

无法使用 pickle 序列化使用 make_dataclass 生成的数据类

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1