自动合并具有重叠字段的多个 Pydantic 模型

Question

用一句话准确地表达我的问题有点困难。

我有以下型号：

from pydantic import BaseModel


class Detail1(BaseModel):
    round: bool
    volume: float


class AppleData1(BaseModel):
    origin: str
    detail: Detail1


class Detail2(BaseModel):
    round: bool
    weight: float


class AppleData2(BaseModel):
    origin: str
    detail: Detail2

这里

AppleData1

有一个属性

detail

，其类型为

Detail1

。

AppleData2

有一个属性

detail

，其类型为

Detail2

。我想创建一个

Apple

类，其中包含

AppleData1

和

AppleData2

的所有属性。

问题（如何实现算法？）

你有实现这个算法的通用方法吗：

每当
```
AppleData1
```
和
```
AppleData2
```
具有同名属性时：
- 如果它们属于同一类型，请使用其中之一。例如，
```
AppleData1.origin
```
  和
```
AppleData2.origin
```
  都是
```
str
```
  类型。所以
```
Apple.origin
```
  也是
```
str
```
  类型。
- 如果它们属于不同类型，请将它们合并。例如，
```
AppleData1.detail
```
  和
```
AppleData2.detail
```
  ，它们分别是
```
Detail1
```
  和
```
Detail2
```
  类型。所以
```
Apple.detail
```
  应该包含所有内部属性。
任何共同的内部属性总是针对相同的物理量。所以覆盖是允许的。例如，
```
Detail1.round
```
和
```
Detail2.round
```
都是
```
bool
```
类型。所以得到的
```
Apple.detail.round
```
也是
```
bool
```
类型。

期待结果

最终结果应该与下面的

Apple

模型相同。（下面

Detail

类的定义仅用于使下面的代码完整。通用方法不应该对

Detail

类进行硬编码。）

class Detail(BaseModel):
    round: bool
    volume: float    
    weight: float

class Apple(BaseModel):
    origin: str
    detail: Detail

我的解决方案（坏例子）

class Detail(Detail1, Detail2):
    pass


class Apple(AppleData1, AppleData2):
    origin: str
    detail: Detail

print(Apple.schema_json())

这个解决方案有效，但它太具体了。

这里我需要从
```
detail
```
和
```
AppleData1
```
中找出
```
AppleData2
```
属性，并专门从
```
Detail
```
和
```
Detail1
```
创建
```
Detail2
```
类。
我需要指出
```
origin
```
是同一类型的公共属性（
```
str
```
）。所以我专门在
```
origin: str
```
类的定义中硬编码了
```
Apple
```
。

Answer 1

简化的解决方案

实现

create_model

函数的自定义递归版本来动态构造“组合”模型类应该可行：

from typing import TypeGuard, TypeVar
from pydantic import BaseModel, create_model
from pydantic.fields import SHAPE_SINGLETON

M = TypeVar("M", bound=BaseModel)


def is_pydantic_model(obj: object) -> TypeGuard[type[BaseModel]]:
    return isinstance(obj, type) and issubclass(obj, BaseModel)


def create_combined_model(
    __name__: str,
    /,
    model1: type[M],
    model2: type[M],
) -> type[M]:
    field_overrides = {}
    for name, field1 in model1.__fields__.items():
        field2 = model2.__fields__.get(name)
        if field2 is None:
            continue
        if is_pydantic_model(field1.type_):
            assert field1.shape == SHAPE_SINGLETON, "No model collections allowed"
            assert is_pydantic_model(field2.type_), f"{name} with different types"
            sub_model = create_combined_model(
                f"Combined{field1.type_.__name__}{field2.type_.__name__}",
                field1.type_,
                field2.type_,
            )
            field_overrides[name] = (sub_model, field1.field_info)
        else:
            assert field1.annotation == field2.annotation, f"Different types"
    return create_model(__name__, __base__=(model1, model2), **field_overrides)  # type: ignore

这包含了您在评论中详细阐述的关于可以组合的模型的限制/假设。

它不支持组合用

C[M]

注解的字段，其中

是任何通用集合类型，

是

BaseModel

的子类。这就是

SHAPE_SINGLETON

检查所保证的。可以合并允许组合模型和保留字段形状的逻辑（例如

list[Detail1]

和

list[Detail2]

），但我忽略了这一点，因为您没有明确要求，而且它有点复杂.

演示

from pydantic import BaseModel


class AppleBase(BaseModel):
    foo: str


class DetailBase(BaseModel):
    round: bool


class Detail1(DetailBase):
    volume: float


class AppleData1(AppleBase):
    bar: int
    detail: Detail1


class Detail2(DetailBase):
    weight: float


class AppleData2(AppleBase):
    baz: float
    detail: Detail2


Apple = create_combined_model("Apple", AppleData1, AppleData2)
print(Apple.schema_json(indent=4))

输出

{
    "title": "Apple",
    "type": "object",
    "properties": {
        "foo": {
            "title": "Foo",
            "type": "string"
        },
        "baz": {
            "title": "Baz",
            "type": "number"
        },
        "detail": {
            "$ref": "#/definitions/CombinedDetail1Detail2"
        },
        "bar": {
            "title": "Bar",
            "type": "integer"
        }
    },
    "required": [
        "foo",
        "baz",
        "detail",
        "bar"
    ],
    "definitions": {
        "CombinedDetail1Detail2": {
            "title": "CombinedDetail1Detail2",
            "type": "object",
            "properties": {
                "round": {
                    "title": "Round",
                    "type": "boolean"
                },
                "weight": {
                    "title": "Weight",
                    "type": "number"
                },
                "volume": {
                    "title": "Volume",
                    "type": "number"
                }
            },
            "required": [
                "round",
                "weight",
                "volume"
            ]
        }
    }
}

注意事项

此解决方案的一个明显缺点是，因为它动态地创建模型类，所以不可能在静态分析方面正确传达结果模型的类型。

我现在编写的方式，该函数在最大程度上是generic，因为返回的类型将被推断为joined或union类型，具体取决于静态类型检查器，这两种类型输入模型

model1

和

model2

。

在演示示例中，这意味着某些类型检查器（例如 Mypy）会将

Apple

的类型推断为

AppleBase

（join）。这当然不是“错误”，但它并不像我们希望的那么具体，因为它无法解释 bar、

baz

和

detail

属性的存在。

使用

unions

的类型检查器可能会将类型推断为 AppleData1 | AppleData2。（我还没有测试过它，但我相信 Pyright 会这样做。）这可能是也可能不是更可取的，因为它至少会

always

覆盖 detail 属性的存在（尽管还有另一种联合类型

Detail1 | Detail2）

），但是

Apple

是否具有

bar

或

baz这样的类型检查器属性是不明确的。

理想的解决方案是将返回类型定义为传递给它的两个模型类型的“交集”。但不幸的是

我们还没有那种类型构造

。所有这些当然对构造类的运行时行为没有影响，但对于 IDE 自动建议等来说，它并不理想。

因此，我仍然建议您对所有涉及的模型使用多重继承的最初显式方法，除非您的模型变得

非常

大/复杂且众多。现在是 2024 年，Daniil 的解决方案对我不起作用，所以我做了一些更改。现在它也接受多种型号：

自动合并具有重叠字段的多个 Pydantic 模型

问题描述投票：0回答：2

问题（如何实现算法？）

期待结果

我的解决方案（坏例子）

2个回答

简化的解决方案

演示

输出

注意事项

最新问题

自动合并具有重叠字段的多个 Pydantic 模型

问题描述 投票：0回答：2

问题（如何实现算法？）

期待结果

我的解决方案（坏例子）

2个回答

简化的解决方案

演示

输出

注意事项

最新问题

问题描述投票：0回答：2