我想通过 PHP 解析以下 HTML:
https://pastebin.com/raw/5Z59HTcW
问题是并不总是应该解析一行,而是应该在一个 foreach
中解析所有 3 个跨度元素。
这是我当前的代码:
$json_object= file_get_contents($url);
$json_decoded = json_decode($json_object);
preg_match_all('/<span class="(name|price|description)">(.*)<\/span>/',$json_decoded->results_html, $sor);
foreach($sor[1] as $k => $v)
{
echo "Name" .$v[0]."<br/>";
echo "price" .$v[1]."<br/>";
echo "des" .$v[2]."<br/>";
}
使用 DOMDocument 解析 HTML 相当简单 - 在本例中甚至不需要任何 XPath
$url='https://pastebin.com/raw/5Z59HTcW';
$dom=new DOMDocument;
$dom->loadHTMLFile( $url );
$col=$dom->getElementsByTagName('span');
if( $col->length > 0 ){
foreach( $col as $span ){
echo $span->getAttribute('class').' '.$span->nodeValue . '<br />';
}
}
输出:
name Test1
description testtest1
price 1 USD
name Test2
description testtest2
price 2 USD
name Test3
description testtest3
price 3 USD
我们可以用
loadHTML
进行遍历
$htmlContent = '<span class="name">Test1</span>
<span class="description">testtest1</span>
<span class="price">1 USD</span>
<span class="name">Test2</span>
<span class="description">testtest2</span>
<span class="price">2 USD</span>
<span class="name">Test3</span>
<span class="description">testtest3</span>
<span class="price">3 USD</span>';
$DOM = new DOMDocument();
$DOM->loadHTML($htmlContent);
$Header = $DOM->getElementsByTagName('span');
//#Get header name of the table
foreach($Header as $NodeHeader)
{
$aDataTableHeaderHTML[] = trim($NodeHeader->textContent);
}
echo '<pre>';
print_r($Header);
echo '</pre>';
echo '<table border="1"><thead><tr><td>Name</td><td>Desc</td><td>Price</td></tr></thead><tbody>';
$lastOne = '';
foreach(array_chunk($aDataTableHeaderHTML,3) as $key=>$v)
{
echo "<tr>
<td>$v[0]</td>
<td>$v[1]</td>
<td>$v[2]</td>
</tr>";
}
echo '</tbody></table>';
输出:
DOMNodeList Object
(
[length] => 9
)
Name Desc Price
Test1 testtest1 1 USD
Test2 testtest2 2 USD
Test3 testtest3 3 USD