如何抓取项目数量不一致的搜索结果列表?
这是一个例子:
在此搜索结果中,您将找到 4 家企业: https://www.11880.com/suche/0521441422/deutschland
现在,这 4 家企业中并非每家都包含营业时间信息: 第一个没有,最后 3 个商家包含营业时间信息。
因此,如果我尝试使用下面的脚本执行此操作,营业时间信息会与错误的企业相关 => 它会与前 3 个企业“连接”,而不是最后 3 个企业。
如何修改脚本,以便营业时间连接到正确的业务?
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20120101 Firefox/33.0');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, 'https://www.11880.com/suche/0521441422/deutschland');
$page = curl_exec($ch);
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($page);
$xpath = new DOMXPath($dom);
$results = [];
$results['name'] = $xpath->query('//h2[@itemprop="name"]');
$results['street'] = $xpath->query('//span[@class="street-address"]');
$results['zipcode'] = $xpath->query('//span[@class="postal-code"]');
$results['city'] = $xpath->query('//span[@class="address-locality"]');
$results['district'] = $xpath->query('//span[@class="quarter"]');
$results['opening_hours'] = $xpath->query('//span[@class="open-or-closed"]');
//*[@id="html-search-result-list"]/li[3]/div/div[3]/div[1]/span[1]
#html-search-result-list > li:nth-child(3) > div > div.row-result-entry--bottom.row > div.col-result-entry-content--contactinfos.hidden-xs.col-sm-8 > span.btn-ghost.btn-ghost-primary.btn-result-entry-interaction.open-or-closed.open
for($x=0; $x < $results['name']->length;$x++)
{
echo trim($results['name']->item($x)->textContent) . ";";
echo trim($results['street']->item($x)->textContent) . ";";
echo trim($results['zipcode']->item($x)->textContent) . ";";
echo trim($results['city']->item($x)->textContent) . ";";
echo trim($results['district']->item($x)->textContent) . ";";
echo trim($results['opening_hours']->item($x)->textContent) . "<br>\n";
}
?>
你可以这样做。这只是草稿
// Find parent divs
$divs = $xpath->query('//h2[@itemprop="name"]/ancestor::div[1]');
for($x=0; $x < $divs->length;$x++) {
// Find items, you want, inside div
$name = $xpath->query('.//h2[@itemprop="name"]', $divs[$x]);
if ($name) {
echo trim($name->item(0)->textContent) . ";";
}
// ...
}