另类标题:遍历两个节点之间树中的所有节点。
我有一个 DOM 树(有序树,其中每个级别的子元素都有特定的顺序)。我在树中得到了
start
和 stop
节点。在树的深度优先遍历中,保证 start
位于 stop
之前。我想高效地找到/遍历这两个节点之间的所有节点。这意味着,如果我可以确定两者之间的节点不包含 stop
节点,我不想递归其所有后代。
这是一个众所周知的(命名的?)问题,它具有遍历算法,可能会计算共享祖先或类似的东西来确定是否遍历节点的后代?
示例0(同级):
<body>
<p>ignore this</p>
<div id="start">start here</div>
<p id="A">visit A</p>
<p id="B">visit B</p>
<p id="C">visit C <span>but <b>not</b> <i>all this</i></span></p>
<div id="stop">stop here</div>
<p>ignore this</p>
</body>
在示例 0 中,我想按顺序访问以下内容:
#A
#B
#C
(但不是其所有后代)示例1(止损位更深):
<body>
<p>ignore this</p>
<div><h2><span id="start">start here</span> A1</h2> A2</div>
<section id="B"><many><more><levels><of><content>…</many></section>
<section id="C">
<div id="D"><many><more><levels><of><content>…</many></div>
<div id="E">
<div id="F"><many><more><levels><of><content>…</many></div>
<div id="stop">stop here</div>
<p>ignore this</p>
</div>
</section>
<p>ignore this</p>
</body>
在示例 1 中,我想按顺序访问以下内容:
A1
和 A2
#B
(但不是其所有后代)#C
(或者也许不是?)#D
(但不是其所有后代)#E
(或者也许不是?)#F
(但不是其所有后代)示例2(止损更高):
<body>
<p>ignore this</p>
<div>
<h2>
<span id="start">start here</span>
<div id="A">visit A</div>
<section id="B"><many><more><levels><of><content>…</many></section>
</h2>
<section id="C"><many><more><levels><of><content>…</many></section>
</div>
<section id="D"><many><more><levels><of><content>…</many></section>
<div id="stop">stop here</div>
<p>ignore this</p>
</body>
在示例 2 中,我想按顺序访问以下内容:
#A
#B
(但不是其后代)#C
(但不是其后代)#D
(但不是其后代)这是一个似乎可以做我想要的事情的算法,用伪代码表示:
results = []
ancestors = get_ancestors_of(stop_node)
current = start_node
while current and current != stop_node:
if current in ancestors:
current = current.children[0]
else:
if current != start_node:
results.add(current)
if current.next_sibling:
current = current.next_sibling
else:
while current and !current.next_sibling:
current = current.parent
if current:
current = current.next_sibling
return results
这是 Ruby 中的方法,它接受一个块并在找到节点时生成节点:
# Traverse nodes between two in the tree, yielding each to the block
# but without traversing the descendants that are wholly between the two.
def traverse_between(start_node, end_node)
# Get all ancestors of the end_node
end_node_ancestors = end_node ? end_node.ancestors.to_set : Set.new
current = start_node
while current && current != end_node
# Traverse children if the end node is in there somewhere.
# We don't yield ancestors of the end_node, since
# they implicitly contain the end_node and possibly more.
if end_node_ancestors.include?(current)
current = current.element_children.first
else
yield current unless current == start_node
if current.next_sibling
current = current.next_sibling
else
# Move up to the next ancestor with a sibling; stop if we reach the root
while current && !current.next_sibling
current = current.parent
return if current.is_a?(Nokogiri::XML::Document)
end
current = current.next_sibling if current
end
end
end
end