如何在标记和某些结束标记之间使用XPath提取文本

Question

我提供了以下HTML。类名始终相同。仅标签之间的文本不同，并且长度和内容也不同。

<a>
    <span class="xxx">Not this text <span class="yyy">not this text</span> <span class="zzz">This is</span> the required text <q class="aaa">this not</q></span>
</a>

我如何在类“ zzz”的标签和行尾之间提取内容，但是结果中应该not包括类“ aaa”的元素？有可能吗？

类别为“ aaa”的元素可能存在或不存在：

<a>
    <span class="xxx">Not this text <span class="yyy">not this text</span> <span class="zzz">This is</span> the required text</span>
</a>

预期结果应该是：

This is the required text

还可能存在“所需文本”部分：

<a>
    <span class="xxx">Not this text <span class="yyy">not this text</span> <span class="zzz">This is</span></span>
</a>

所以结果应该是：

This is

我使用DOMXPath在PHP中尝试此方法。

Answer 1

我不一定不知道如何使用XPath来做到这一点，但是这是一种无需XPath就可以做到的方式。

function walk(DOMNode $node, $skipParent = false) {
    if (!$skipParent) {
        yield $node;
    }
    if ($node->hasChildNodes()) {
        foreach ($node->childNodes as $n) {
            yield from walk($n);
        }
    }
}

$html = <<<'HTML'
<span class="xxx">
    Not this text
    <span class="yyy">not this text</span>
    <span class="zzz">This is</span>
    the required text
    <q class="aaa">this not</q>
</span>
HTML;

$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

$count = 0;
foreach(walk($dom->firstChild) as $node) {
    if (!($node instanceof DOMText) && $node->hasAttribute('class') && $node->getAttribute('class') === 'xxx') {
        foreach(walk($node) as $n) {
            if (isset($content)) {
                $count++;
            }
            if (!($n instanceof DOMText) && $n->hasAttribute('class') && $n->getAttribute('class') === 'zzz') {
                $content = $n->textContent;
            }
            if (isset($content) && $n instanceof DOMText && $count == 2) {
                $content .= " " . $n->textContent;
                break 2;
            }
        }
    }
}

var_dump($content);

无论是否存在"the required text"部分，这都会为您提供所需的结果。

如何在标记和某些结束标记之间使用XPath提取文本

问题描述投票：0回答：1

1个回答

最新问题

如何在标记和某些结束标记之间使用XPath提取文本

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1