使用xpath时避免类中的某些元素

问题描述 投票:2回答:1

我想在a标签中提取文字,但我不希望span class中的文字说“新上市”。使用xpath,我如何才能获得以下文本:

新!责任的召唤:第二次世界大战(微软XBOX ONE DISC 2017)WW2工厂密封!

PHP SCRAPER

$document = new DOMDocument( '1.0', 'UTF-8' );
$document->preserveWhiteSpace = false;
$internalErrors = libxml_use_internal_errors( true );
$ebayhtml = file_get_contents( $ebayurl );
$document->loadHTML( $ebayhtml );
libxml_use_internal_errors( $internalErrors );

$xpath = new DOMXpath( $document );
$headers = $xpath->query( '//h3[@class="lvtitle"]/a' );
$ebayx = 0;

foreach ( $headers as $title ) {
    if ( $ebayx > 9 ) {
        break;
    } else {
        $header = $title->nodeValue . PHP_EOL;
        $header = strlen($header) > 60 ? substr($header,0,60) . "..." : $header;
        echo '<pre>';
        echo $header;
        echo '</pre>';
        $ebayx++;
                }
            }

HTML代码被删除

<a href="https://www.ebay.com/itm/NEW-CALL-OF-DUTY-WWII-Microsoft-XBOX-ONE-DISC-2017-WW2-Factory-Sealed/173060343645?epid=237222746&amp;hash=item284b33475d:g:Xf4AAOSwI8laCc~I" class="vip" title="Click this link to access NEW! CALL OF DUTY: WWII (Microsoft XBOX ONE DISC 2017) WW2 Factory Sealed!"><span class="newly">New listing</span>
        NEW! CALL OF DUTY: WWII (Microsoft XBOX ONE DISC 2017) WW2 Factory Sealed!</a>
php xpath web-scraping
1个回答
2
投票

如果这个XPath,

//h3[@class="lvtitle"]/a

选择目标a元素,然后这个XPath,

//h3[@class="lvtitle"]/a/text()

将仅选择其直接文本节点子节点,因此根据请求排除span子元素。

© www.soinside.com 2019 - 2024. All rights reserved.