我希望能够在 PyYAML 的 dump() 函数生成的 YAML 中生成锚点。有没有办法做到这一点?理想情况下,锚点应与 YAML 节点具有相同的名称。
示例:
import yaml
yaml.dump({'a': [1,2,3]})
'a: [1, 2, 3]\n'
我想要做的是生成 YAML,例如:
import yaml
yaml.dump({'a': [1,2,3]})
'a: &a [1, 2, 3]\n'
我可以编写自定义发射器或转储器来执行此操作吗?还有别的办法吗?
默认情况下,仅当检测到对先前看到的对象的引用时才会发出锚点:
>>> import yaml
>>>
>>> foo = {'a': [1,2,3]}
>>> doc = (foo,foo)
>>>
>>> print yaml.safe_dump(doc, default_flow_style=False)
- &id001
a:
- 1
- 2
- 3
- *id001
如果您想覆盖它的命名方式,则必须自定义 Dumper 类,特别是
generate_anchor()
函数。 ANCHOR_TEMPLATE
也可能有用。
在您的示例中,节点名称很简单,但您需要考虑 YAML 值的多种可能性,即它可以是序列而不是单个值:
>>> import yaml
>>>
>>> foo = {('a', 'b', 'c'): [1,2,3]}
>>> doc = (foo,foo)
>>>
>>> print yaml.dump(doc, default_flow_style=False)
!!python/tuple
- &id001
? !!python/tuple
- a
- b
- c
: - 1
- 2
- 3
- *id001
这并不那么容易。 除非您要用于锚点的数据位于节点内部。 这是因为锚点附加到节点内容,在您的示例中为“[1,2,3]”,并且不知道该值与键“a”关联。
l = [1, 2, 3]
foo = {'a': l, 'b': l}
class SpecialAnchor(yaml.Dumper):
def generate_anchor(self, node):
print('Generating anchor for {}'.format(str(node)))
anchor = super().generate_anchor(node)
print('Generated "{}"'.format(anchor))
return anchor
y1 = yaml.dump(foo, Dumper=Anchor)
给你:
Generating anchor for SequenceNode(
tag='tag:yaml.org,2002:seq', value=
[ScalarNode(tag='tag:yaml.org,2002:int', value='1'),
ScalarNode(tag='tag:yaml.org,2002:int', value='2'),
ScalarNode(tag='tag:yaml.org,2002:int', value='3')]
)
Generated "id001"
a: &id001 [1, 2, 3]
b: *id001
到目前为止,我还没有找到一种方法来获取给定节点的密钥“a”...
我编写了一个自定义锚点类来强制顶级节点的锚点值。它不是简单地覆盖锚字符串(使用generate_anchor),而是实际上强制发出Anchor,即使稍后没有引用该节点:
class CustomAnchor(yaml.Dumper):
def __init__(self, *args, **kwargs):
super(CustomAnchor, self).__init__(*args, **kwargs)
self.depth = 0
self.basekey = None
self.newanchors = {}
def anchor_node(self, node):
self.depth += 1
if self.depth == 2:
assert isinstance(node, yaml.ScalarNode), "yaml node not a string: %s" % node
self.basekey = str(node.value)
node.value = self.basekey + "_ALIAS"
if self.depth == 3:
assert self.basekey, "could not find base key for value: %s" % node
self.newanchors[node] = self.basekey
super(CustomAnchor, self).anchor_node(node)
if self.newanchors:
self.anchors.update(self.newanchors)
self.newanchors.clear()
请注意,我覆盖了带有“_ALIAS”后缀的节点名称,但您可以删除该行以使节点名称和锚点名称保持不变,或将其更改为其他名称。
例如转储 {'FOO': 'BAR'} 会导致:
FOO_ALIAS:&FOO BAR
此外,我编写它只是为了一次处理单个顶级键/值对,并且它只会强制顶级键的锚定。如果要将字典转换为 YAML 文件,其中所有键都是顶级 YAML 节点,则需要迭代字典并将每个键/值对转储为 {key:value},或者重写此类来处理具有多个键的字典。
这个问题已经很老了,并且 aaa90210 在他的答案中已经有一些很好的指示,但是提供的类并没有真正做到我想要的,我认为它不能很好地概括。
我尝试想出一个转储程序,允许添加锚点并确保在文件稍后再次出现密钥时创建相应的别名。这绝不是功能齐全的,它可能可以变得更安全,但我希望它能给其他人带来启发:
import yaml
from typing import Dict
class CustomAnchor(yaml.Dumper):
"""Customer Dumper class to create anchors for keys throughout the YAML file.
Attributes:
added_anchors: mapping of key names to the node objects representing their value, for nodes that have an anchor
"""
def __init__(self, *args, **kwargs):
"""Initialize class.
We call the constructor of the parent class.
"""
super().__init__(*args, **kwargs)
self.filter_keys = ['a', 'b']
self.added_anchors: Dict[str, yaml.ScalarNode] = {}
def anchor_node(self, node):
"""Override method from parent class.
This method first checks if the node contains the keys of interest, and if anchors already exist for these keys,
replaces the reference to the value node to the one that the anchor points to. In case no anchor exist for
those keys, it creates them and keeps a reference to the value node in the ``added_anchors`` class attribute.
Args:
node (yaml.Node): the node being processed by the dumper
"""
if isinstance(node, yaml.MappingNode):
# let's check through the mapping to find keys which are of interest
for i, (key_node, value_node) in enumerate(node.value):
if (
isinstance(key_node, yaml.ScalarNode)
and key_node.value in self.filter_keys
):
if key_node.value in self.added_anchors: # anchor exists
# replace value node to tell the dumper to create an alias
node.value[i] = (key_node, self.added_anchors[key_node.value])
else: # no anchor yet exists but we need to create one
self.anchors.update({value_node: key_node.value})
self.added_anchors[key_node.value] = value_node
super().anchor_node(node)
import yaml
class _CustomAnchor(yaml.Dumper):
anchor_tags = {}
def __init__(self,*args,**kwargs):
super().__init__(*args,**kwargs)
self.new_anchors = {}
self.anchor_next = None
def anchor_node(self, node):
if self.anchor_next is not None:
self.new_anchors[node] = self.anchor_next
self.anchor_next = None
if isinstance(node.value, str) and node.value in self.anchor_tags:
self.anchor_next = self.anchor_tags[node.value]
super().anchor_node(node)
if self.new_anchors:
self.anchors.update(self.new_anchors)
self.new_anchors.clear()
def CustomAnchor(tags):
return type('CustomAnchor', (_CustomAnchor,), {'anchor_tags': tags})
print(yaml.dump(foo, Dumper=CustomAnchor({'a': 'a_name'})))
这并没有提供一种方法来区分具有相同名称值的两个节点,这需要一个相当于 XML 的 xpath 的 yaml,而我在 pyyaml 中没有看到它:(
CustomAnchor
允许您传递基于节点值的锚点字典。
{value: anchor_name}