我必须使用 DeepDiff 来比较两个字典列表与某些变量的度量。比较应该以这样的方式进行:如果新指标等于或大于以前的指标,则不应显示词典之间的差异,但如果新指标较小,则应指示差异。
这是我的列表示例:
old = [
{
'variable': 'location',
'accuracy': 0.6338672768878718,
'coverage': 0.9278131634819533,
},
{
'variable': 'operating_name',
'accuracy': 0.7156488549618321,
'coverage': 0.16129032258064516,
},
{
'variable': 'years_in_business',
'accuracy': 0.8686224489795918,
'coverage': 0.48590021691973967,
},
]
new = [
{
'variable': 'location',
'accuracy': 0.6227561657767604,
'coverage': 0.9267020523708422,
},
{
'variable': 'operating_name',
'accuracy': 0.8883720930232558,
'coverage': 0.49710982658959535,
},
{
'variable': 'years_in_business',
'accuracy': 0.8564488549618321,
'coverage': 0.4124206283185841,
},
]
使用以下自定义运算符我已成功地按照描述执行比较,没有任何问题:
class CloseToOrGreatherThan(BaseOperator):
def __init__(self, types):
super().__init__(types=types)
def give_up_diffing(self, level, diff_instance) -> bool:
new = level.t1
old = level.t2
new_acc = round(new["accuracy"] * 100, 1)
base_acc = round(old["accuracy"] * 100, 1)
acc_diff = isclose(new_acc, base_acc, abs_tol=0.9) or new_acc > base_acc # new >= old?
new_cvr = round(new["coverage"] * 100, 1)
base_cvr = round(old["coverage"] * 100, 1)
cvr_diff = isclose(new_cvr, base_cvr, abs_tol=0.9) or new_cvr > base_cvr # new >= old?
# if either of the new accuracy or coverage is less than the old accuracy or coverage, mark the difference
if not (acc_diff and cvr_diff):
report = f"accuracy: {old['accuracy']} => {new['accuracy']}" if not acc_diff else ""
report = " / ".join(s for s in [report, f"coverage: {old['coverage']} => {new['coverage']}"] if s) if not cvr_diff else report
diff_instance.custom_report_result(old['variable'], level, report)
return True
像这样执行 DeepDiff:
DeepDiff(new, old, custom_operators=[CloseToOrGreatherThan(types=[dict])])
输出符合预期:
{'location': {'root[0]': 'accuracy: 0.6338672768878718 => 0.6227561657767604'}, 'years_in_business': {'root[2]': 'accuracy: 0.8686224489795918 => 0.8564488549618321 / coverage: 0.48590021691973967 => 0.4124206283185841'}}
但是,我遇到的问题是,如果字典列表中变量的顺序不同,则比较不再有效。也就是说,如果列表现在是这样的:
old = [
{
'variable': 'location',
'accuracy': 0.6338672768878718,
'coverage': 0.9278131634819533,
},
{
'variable': 'operating_name',
'accuracy': 0.7156488549618321,
'coverage': 0.16129032258064516,
},
{
'variable': 'years_in_business',
'accuracy': 0.8686224489795918,
'coverage': 0.48590021691973967,
},
]
new = [
{
'variable': 'years_in_business',
'accuracy': 0.8564488549618321,
'coverage': 0.4124206283185841,
},
{
'variable': 'location',
'accuracy': 0.6227561657767604,
'coverage': 0.9267020523708422,
},
{
'variable': 'operating_name',
'accuracy': 0.8883720930232558,
'coverage': 0.49710982658959535,
},
]
DeepDiff 将比较位置与业务年数、运营名称与位置以及业务年数与运营名称。
我尝试使用 iterable_compare_func 来指示如何比较变量,但它并没有像我预期的那样工作。我想要做的是当且仅当变量名称相同时将旧列表中的一项与新列表中的一项进行比较:
def compare(x, y, level):
try:
return x['variable'] == y['variable']
except Exception:
raise CannotCompare() from None
当我使用两个参数调用 DeepDiff 时,它会重新运行
{}
。
DeepDiff(new, old, custom_operators=[CloseToOrGreatherThan(types=[dict])], iterable_compare_func=compare)
用
verbose_level=2
调用DeepDiff,返回这个,这不是我期望的结果:
{'iterable_item_moved': {'root[0]': {'new_path': 'root[2]', 'value': {'variable': 'years_in_business', 'accuracy': 0.8686224489795918, 'coverage': 0.48590021691973967}}, 'root[1]': {'new_path': 'root[0]', 'value': {'variable': 'location', 'accuracy': 0.6338672768878718, 'coverage': 0.9278131634819533}}, 'root[2]': {'new_path': 'root[1]', 'value': {'variable': 'operating_name', 'accuracy': 0.7156488549618321, 'coverage': 0.16129032258064516}}}}
你知道我做错了什么或者我怎样才能实现我需要做的事情吗?如果可能的话,我希望使用 DeepDiff 功能来解决这个问题,而不必在调用函数之前预处理列表。
您尝试过“ignore_order”参数吗?