有趣的是,我还没有找到这方面的有效示例。使用 php,我试图将给定 url 的所有图像抓取/重新显示到另一个网站上。我知道如何用文本来做到这一点,但图像,我不确定。有人知道一个好的工作示例吗?我知道如何抓取所有内容,但不仅仅是图像。例如,这可以完成整个页面:
<?php
$curl = curl_init();
curl_setopt ($curl, CURLOPT_URL,
"https://en.wikipedia.org/wiki/Wikipedia:Picture_of_the_day");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec ($curl);
curl_close ($curl);
echo $result;
?>
非常感谢。 -威尔逊
*理想情况下,实际上这只会抓取第一张图像,例如上面的示例。但我不会超前,只是想把这个功能搞下来。
您可以使用文件来保存结果。
$fp = fopen($filename, 'a+');
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:29.0) Gecko/20100101 Firefox/29.0');
curl_setopt($ch, CURLOPT_ENCODING, 'gzip, deflate');
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_NOPROGRESS, false);
curl_setopt($ch, CURLOPT_PROGRESSFUNCTION, function ($dltotal, $dlnow, $ultotal, $ulnow) {
});
curl_setopt($ch, CURLOPT_LOW_SPEED_LIMIT, 1);
curl_setopt($ch, CURLOPT_LOW_SPEED_TIME, 8);
curl_exec($ch);
$error = curl_error($ch);
$http_code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
$content_type = curl_getinfo($ch, CURLINFO_CONTENT_TYPE);
$end_size = $begin_size + curl_getinfo($ch, CURLINFO_SIZE_DOWNLOAD);
Log::info('end_size='.$end_size);
curl_close($ch);
fclose($fp);
您可以添加/使用 PHP DOMXPath 函数来解析您的抓取结果。
在代码后面添加以下脚本
$dom = new DOMDocument();
@$dom->loadHTML($result);
$xpath = new DOMXPath($dom);
//get all images
$images = $xpath->query ('//img/@src');
$img = array();
foreach ( $images as $image) {
$img[] = $image->nodeValue;
}
print_r($img);
尝试用这个更改你的 CURL 代码
$url = 'https://en.wikipedia.org/wiki/Wikipedia:Picture_of_the_day';
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.10 (KHTML, like Gecko) Chrome/8.0.552.224 Safari/534.10');
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 0); //untuk https
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, 0);//untuk https
//curl_setopt($curl, CURLOPT_ENCODING , 'gzip');
$html = curl_exec($curl);
if(curl_error($curl)){
echo 'Curl error: ' . curl_error($curl);
$result = ''; //return empty if error
}
else {
$result = $html;
}