Pydantic root_validator - 可以对整个模型使用一次，而不是单个验证器？

Question

我正在使用 Pydantic 创建基于熊猫时间戳（

start

，

end

）和 Timedelta（

period

）对象的时间序列模型。该模型将由具有许多配置/场景的小型数据分析程序使用。

我需要基于两个 bool (

include_end_period

,

allow_future

) 和一个可选的 int (

max_periods

) 配置参数来实例化和验证 Timeseries 模型的各个方面。然后我需要派生三个新字段（

timezone

、

total_duration

、

total_periods

）并执行一些额外的验证。

由于在验证另一个值时需要使用一个值的多个实例，我无法使用典型的

@validator

方法获得预期的结果。特别是，我经常会得到一个丢失的 KeyError 而不是预期的 ValueError。我发现的最佳解决方案是创建一个长

@root_validator(pre=True)

方法。

from pydantic import BaseModel, ValidationError, root_validator, conint
from pandas import Timestamp, Timedelta


class Timeseries(BaseModel):
    start: Timestamp
    end: Timestamp
    period: Timedelta
    include_end_period: bool = False
    allow_future: bool = True
    max_periods: conint(gt=0, strict=True) | None = None
    
    # Derived values, do not pass as params
    timezone: str | None
    total_duration: Timedelta
    total_periods: conint(gt=0, strict=True)
    
    class Config:
        extra = 'forbid'
        validate_assignment = True
    
    
    @root_validator(pre=True)
    def _validate_model(cls, values):
        
        # Validate input values
        if values['start'] > values['end']:
            raise ValueError('Start timestamp cannot be later than end')
        if values['start'].tzinfo != values['end'].tzinfo:
            raise ValueError('Start, end timezones do not match')
        if values['period'] <= Timedelta(0):
            raise ValueError('Period must be a positive amount of time')
        
        # Set timezone
        timezone = values['start'].tzname()
        if 'timezone' in values and values['timezone'] != timezone:
            raise ValueError('Timezone param does not match start timezone')
        values['timezone'] = timezone
        
        # Set duration (add 1 period if including end period)
        total_duration = values['end'] - values['start']
        if values['include_end_period']:
            total_duration += values['period']
        if 'total_duration' in values and values['total_duration'] != total_duration:
            error_context = ' + 1 period (included end period)' if values['include_end_period'] else ''
            raise ValueError(f'Duration param does not match end - start timestamps{error_context}')
        values['total_duration'] = total_duration
        
        # Set total_periods
        periods_float: float = values['total_duration'] / values['period']
        if periods_float != int(periods_float):
            raise ValueError('Total duration not divisible by period length')
        total_periods = int(periods_float)
        if 'total_periods' in values and values['total_periods'] != total_periods:
            raise ValueError('Total periods param does not match')
        values['total_periods'] = total_periods
        
        # Validate future
        if not values['allow_future']:
            # Get current timestamp to floor of period (subtract 1 period if including end period)
            max_end: Timestamp = Timestamp.now(tz=values['timezone']).floor(freq=values['period'])
            if values['include_end_period']:
                max_end -= values['period']
            if values['end'] > max_end:
                raise ValueError('End period is future or current (incomplete)')
        
        # Validate derived values
        if values['total_duration'] < Timedelta(0):
            raise ValueError('Total duration must be positive amount of time')
        if values['max_periods'] and values['total_periods'] > values['max_periods']:
            raise ValueError('Total periods exceeds max periods param')
        
        return values

在快乐案例中实例化模型，使用所有配置检查：

start = Timestamp('2023-03-01T00:00:00Z')
end = Timestamp('2023-03-02T00:00:00Z')
period = Timedelta('5min')

try:
    ts = Timeseries(start=start, end=end, period=period,
                    include_end_period=True, allow_future=False, max_periods=10000)
    print(ts.dict())
except ValidationError as e:
    print(e)

输出：

"""
{'start': Timestamp('2023-03-01 00:00:00+0000', tz='UTC'),
 'end': Timestamp('2023-03-02 00:00:00+0000', tz='UTC'),
 'period': Timedelta('0 days 00:05:00'),
 'include_end_period': True,
 'allow_future': False,
 'max_periods': 10000,
 'timezone': 'UTC',
 'total_duration': Timedelta('1 days 00:05:00'),
 'total_periods': 289}
"""

在这里，我相信我的所有验证都按预期工作，并提供预期的 ValueErrors 而不是不太有用的 KeyErrors。 这种方法合理吗？这似乎违背了典型/推荐的方法，并且

@root_validator

文档与

@validator

的文档相比非常简短。

我也不满意，我需要在模型顶部列出派生值（

timezone

，

total_duration

，

total_periods

）。这意味着它们可以/应该在实例化时传递，并且在我的验证器脚本中需要额外的逻辑来检查它们是否被传递，以及它们是否与派生值匹配。通过省略它们，它们将无法从类型、约束等的默认验证中受益，并且会迫使我将配置更改为

extra='allow'

。我将不胜感激有关如何改进这一点的任何提示。

谢谢！

Pydantic root_validator - 可以对整个模型使用一次，而不是单个验证器？

问题描述投票：0回答：0

最新问题

Pydantic root_validator - 可以对整个模型使用一次，而不是单个验证器？

问题描述 投票：0回答：0

最新问题

问题描述投票：0回答：0