如何获取包含shadowRoot元素的文档或节点中的所有HTML

Question

这个问题我还没有看到满意的答案。这基本上是这个问题的重复，但它关闭不当，给出的答案不充分。

我想出了自己的解决方案，我将在下面发布。

这对于网页抓取很有用，或者在我的例子中，在处理自定义元素的 javascript 库上运行测试。我确保它生成我想要的输出，然后使用此函数来抓取给定测试输出的 HTML，并使用复制的 HTML 作为预期输出来与将来的测试进行比较。

Answer 1

这是一个可以执行请求的函数。请注意，它会忽略 html 注释和其他边缘内容。但它使用 ShadowRoots 检索常规元素、文本节点和自定义元素。它还处理开槽模板内容。它尚未经过详尽的测试，但似乎可以很好地满足我的需求。

像

extractHTML(document.body)

或

extractHTML(document.getElementByID('app'))

一样使用它。

function extractHTML(node) {
            
    // return a blank string if not a valid node
    if (!node) return ''

    // if it is a text node just return the trimmed textContent
    if (node.nodeType===3) return node.textContent.trim()

    //beyond here, only deal with element nodes
    if (node.nodeType!==1) return ''

    let html = ''

    // clone the node for its outer html sans inner html
    let outer = node.cloneNode()

    // if the node has a shadowroot, jump into it
    node = node.shadowRoot || node
    
    if (node.children.length) {
        
        // we checked for children but now iterate over childNodes
        // which includes #text nodes (and even other things)
        for (let n of node.childNodes) {
            
            // if the node is a slot
            if (n.assignedNodes) {
                
                // an assigned slot
                if (n.assignedNodes()[0]){
                    // Can there be more than 1 assigned node??
                    html += extractHTML(n.assignedNodes()[0])

                // an unassigned slot
                } else { html += n.innerHTML }                    

            // node is not a slot, recurse
            } else { html += extractHTML(n) }
        }

    // node has no children
    } else { html = node.innerHTML }

    // insert all the (children's) innerHTML 
    // into the (cloned) parent element
    // and return the whole package
    outer.innerHTML = html
    return outer.outerHTML
    
}

Answer 2

仅当使用

mode:"open"

设置创建shadowRoots时，您才能从外部访问shadowRoots。

然后，您可以使用 something 潜入元素和 ShadowRoots，例如：

 const shadowDive = (
          el, 
          selector, 
          match = (m, r) => console.warn('match', m, r)
  ) => {
    let root = el.shadowRoot || el;
    root.querySelector(selector) && match(root.querySelector(selector), root);
    [...root.children].map(el => shadowDive(el, selector, match));
  }

注意：如果 Web 组件样式基于 ShadowDOM 行为，则提取原始 HTML 是没有意义的；你将失去所有正确的造型。

Answer 3

对我来说，Moss的解决方案几乎有效。只是插槽未包含在输出中。我使用法学硕士来改进答案并对其进行了测试，现在插槽已包含在我所需要的内容中。对于其他遇到这个问题并需要插槽的人，这里是代码：

function extractHTML(node) {
        
// return a blank string if not a valid node
if (!node) return '';

// if it is a text node just return the trimmed textContent
if (node.nodeType === 3) return node.textContent.trim();

// beyond here, only deal with element nodes
if (node.nodeType !== 1) return '';

let html = '';

// clone the node for its outer html sans inner html
let outer = node.cloneNode();

// if the node has a shadowroot, jump into it
node = node.shadowRoot || node

if (node.children.length || node.childNodes.length) {
    
                                                                                    
    // iterate over childNodes which includes #text nodes and slots
    for (let n of node.childNodes) {
        
        // if the node is a slot
        if (n.nodeName === 'SLOT') {
            
            // check if slot has assigned nodes
            let assignedNodes = n.assignedNodes();
            if (assignedNodes.length > 0) {
                // if there are assigned nodes, recurse over them
                for (let assignedNode of assignedNodes) {
                                 
                                                                                    

                                      
                    html += extractHTML(assignedNode);
                }
            } else {
                // if no assigned nodes, preserve the <slot> element itself
                html += n.outerHTML;
            }

        // node is not a slot, recurse normally
        } else {
            html += extractHTML(n);
        }
    }
} else {
    // node has no children, insert its innerHTML
    html = node.innerHTML;
}

// insert all the (children's) innerHTML 
// into the (cloned) parent element
// and return the whole package
outer.innerHTML = html;
return outer.outerHTML;

}

如何获取包含shadowRoot元素的文档或节点中的所有HTML

问题描述投票：0回答：3

3个回答

最新问题

如何获取包含shadowRoot元素的文档或节点中的所有HTML

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3