我想在a
标签中提取文字,但我不希望span class
中的文字说“新上市”。使用xpath,我如何才能获得以下文本:
新!责任的召唤:第二次世界大战(微软XBOX ONE DISC 2017)WW2工厂密封!
PHP SCRAPER
$document = new DOMDocument( '1.0', 'UTF-8' );
$document->preserveWhiteSpace = false;
$internalErrors = libxml_use_internal_errors( true );
$ebayhtml = file_get_contents( $ebayurl );
$document->loadHTML( $ebayhtml );
libxml_use_internal_errors( $internalErrors );
$xpath = new DOMXpath( $document );
$headers = $xpath->query( '//h3[@class="lvtitle"]/a' );
$ebayx = 0;
foreach ( $headers as $title ) {
if ( $ebayx > 9 ) {
break;
} else {
$header = $title->nodeValue . PHP_EOL;
$header = strlen($header) > 60 ? substr($header,0,60) . "..." : $header;
echo '<pre>';
echo $header;
echo '</pre>';
$ebayx++;
}
}
HTML代码被删除
<a href="https://www.ebay.com/itm/NEW-CALL-OF-DUTY-WWII-Microsoft-XBOX-ONE-DISC-2017-WW2-Factory-Sealed/173060343645?epid=237222746&hash=item284b33475d:g:Xf4AAOSwI8laCc~I" class="vip" title="Click this link to access NEW! CALL OF DUTY: WWII (Microsoft XBOX ONE DISC 2017) WW2 Factory Sealed!"><span class="newly">New listing</span>
NEW! CALL OF DUTY: WWII (Microsoft XBOX ONE DISC 2017) WW2 Factory Sealed!</a>
如果这个XPath,
//h3[@class="lvtitle"]/a
选择目标a
元素,然后这个XPath,
//h3[@class="lvtitle"]/a/text()
将仅选择其直接文本节点子节点,因此根据请求排除span
子元素。