我尝试使用php
表达式DOMXPath::query访问网页上表的值。当我在此页面中使用Web浏览器导航时,可以看到该表,但是当我执行查询时,该表不可见并且似乎不可访问。
此表有一个ID,但是当我在查询中指定它时,将返回另一个ID。我想读取ID为“ totals”的表,但我只有一个ID为“ per_game”的表。当我检查页面的代码时,注释中似乎有很多元素。
这是我的剧本:
<?php
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->strictErrorChecking = false;
$doc->recover = true;
$doc->loadHTMLFile('https://www.basketball-reference.com/players/j/jokicni01.html');
$xpath = new DOMXPath($doc);
$table = $xpath->query("//div[@id='totals']")->item(0);
$elem = $doc->saveXML($table);
echo $elem;
?>
我如何读取ID为'totals'的表中的元素?
完整路径为/html/body/div[@id="wrap"]/div[@id="content"]/div[@id="all_totals"]/div[@class="table_outer_container"]/div[@id="div_totals"]/table[@id="totals"]
您可以将查询分为两部分:首先,在正确的div中检索注释,然后使用此内容创建一个新文档以检索所需的元素:
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->strictErrorChecking = false;
$doc->recover = true;
@$doc->loadHTMLFile('https://www.basketball-reference.com/players/j/jokicni01.html');
$xpath = new DOMXPath($doc);
// retrieve the comment section in 'all_totals' div
$all_totals_element = $xpath->query('/html/body/div[@id="wrap"]/div[@id="content"]/div[@id="all_totals"]/comment()')->item(0);
$all_totals_table = $doc->saveXML($all_totals_element);
// strip comment tags to keep the content inside
$all_totals_table = substr($all_totals_table, strpos($all_totals_table, '<!--') + strlen('<!--'));
$all_totals_table = substr($all_totals_table, 0, strpos($all_totals_table, '-->'));
// create a new Document with the content of the comment
$tableDoc = new DOMDocument ;
$tableDoc->loadHTML($all_totals_table);
$xpath = new DOMXPath($tableDoc);
// second part of the query
$totals = $xpath->query('/div[@class="table_outer_container"]/div[@id="div_totals"]/table[@id="totals"]')->item(0);
echo $tableDoc->saveXML($totals) ;