在 HTML 中的两个元素之间选择不同级别的元素

问题描述 投票:0回答:1

另类标题:遍历两个节点之间树中的所有节点。

我有一个 DOM 树(有序树,其中每个级别的子元素都有特定的顺序)。我在树中得到了

start
stop
节点。在树的深度优先遍历中,保证
start
位于
stop
之前。我想高效地找到/遍历这两个节点之间的所有节点。这意味着,如果我可以确定两者之间的节点不包含
stop
节点,我不想递归其所有后代。

这是一个众所周知的(命名的?)问题,它具有遍历算法,可能会计算共享祖先或类似的东西来确定是否遍历节点的后代?

示例0(同级):

<body>
  <p>ignore this</p>
  <div id="start">start here</div>
  <p id="A">visit A</p>
  <p id="B">visit B</p>
  <p id="C">visit C <span>but <b>not</b> <i>all this</i></span></p>
  <div id="stop">stop here</div>
  <p>ignore this</p>
</body>

在示例 0 中,我想按顺序访问以下内容:

  • #A
  • #B
  • #C
    (但不是其所有后代)

示例1(止损位更深):

<body>
  <p>ignore this</p>
  <div><h2><span id="start">start here</span> A1</h2> A2</div>
  <section id="B"><many><more><levels><of><content></many></section>
  <section id="C">
     <div id="D"><many><more><levels><of><content></many></div>
     <div id="E">
       <div id="F"><many><more><levels><of><content></many></div>
       <div id="stop">stop here</div>
       <p>ignore this</p>
     </div>
  </section>
  <p>ignore this</p>
</body>

在示例 1 中,我想按顺序访问以下内容:

  • 文本节点
    A1
    A2
  • #B
    (但不是其所有后代)
  • #C
    (或者也许不是?)
  • #D
    (但不是其所有后代)
  • #E
    (或者也许不是?)
  • #F
    (但不是其所有后代)

示例2(止损更高):

<body>
  <p>ignore this</p>
  <div>
    <h2>
      <span id="start">start here</span>
      <div id="A">visit A</div>
      <section id="B"><many><more><levels><of><content></many></section>
    </h2>
    <section id="C"><many><more><levels><of><content></many></section>
  </div>
  <section id="D"><many><more><levels><of><content></many></section>
  <div id="stop">stop here</div>
  <p>ignore this</p>
</body>

在示例 2 中,我想按顺序访问以下内容:

  • #A
  • #B
    (但不是其后代)
  • #C
    (但不是其后代)
  • #D
    (但不是其后代)
tree traversal
1个回答
0
投票

这是一个似乎可以做我想要的事情的算法,用伪代码表示:

results   = []
ancestors = get_ancestors_of(stop_node)
current   = start_node

while current and current != stop_node:
    if current in ancestors:
        current = current.children[0]
    else:
        if current != start_node:
            results.add(current)

        if current.next_sibling:
            current = current.next_sibling
        else:
            while current and !current.next_sibling:
                current = current.parent

            if current:
                current = current.next_sibling

return results

这是 Ruby 中的方法,它接受一个块并在找到节点时生成节点:

# Traverse nodes between two in the tree, yielding each to the block
# but without traversing the descendants that are wholly between the two.
def traverse_between(start_node, end_node)
    # Get all ancestors of the end_node
    end_node_ancestors = end_node ? end_node.ancestors.to_set : Set.new

    current = start_node

    while current && current != end_node
        # Traverse children if the end node is in there somewhere.
        # We don't yield ancestors of the end_node, since
        # they implicitly contain the end_node and possibly more.
        if end_node_ancestors.include?(current)
            current = current.element_children.first
        else
            yield current unless current == start_node

            if current.next_sibling
                current = current.next_sibling
            else
                # Move up to the next ancestor with a sibling; stop if we reach the root
                while current && !current.next_sibling
                    current = current.parent
                    return if current.is_a?(Nokogiri::XML::Document)
                end
                current = current.next_sibling if current
            end
        end
    end
end
最新问题
© www.soinside.com 2019 - 2025. All rights reserved.