嗨我想要刮到亚马逊产品的价格,但当我通过HTML dom请求页面它显示空白页但但如果我把链接的aliexpress它工作正常
例如 :
$value = "https://www.amazon.com/Apple-iPhone-Plus-Unlocked-32GB/dp/B01N6ZAR0D/"
$html = file_get_html($value);
echo $html;
直接通过html dom客户端请求不是sugessted。特别是如果你在像亚马逊这样的大型网站上工作。像亚马逊这样的网站,检查客户端用户代理,cookie和标头信息以验证安全性并检查是否是机器人。
所以,
您应该使用curl或guzzle来请求提供必要请求标头的网页。请求后返回响应字符串并通过str_get_html解析它。
例:
$response = $client->request($url);
$html = str_get_html($response);
您的问题的真实工作示例:单击此link以通过github获取代码
require __DIR__ . '/vendor/autoload.php';
require 'simple_html_dom.php';
use Curl\Curl;
// initialize curl
// you can install via "composer require php-curl-class/php-curl-class"
$curl = new Curl();
// set cookies
$curl->setCookieFile(__DIR__ . '/cookies.txt');
$curl->setCookieJar(__DIR__ . '/cookies.txt');
// decode gzip encoded because amazon is using gzip
$curl->setOpt(CURLOPT_ENCODING , "gzip");
// set request header like a browser
$curl->setHeaders([
'accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
'accept-encoding' => 'gzip, deflate, br',
'accept-language' => 'en,tr;q=0.9',
'user-agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36',
]);
// request
$curl->get('https://www.amazon.com/Apple-iPhone-Plus-Unlocked-32GB/dp/B01N6ZAR0D/');
// get raw response
$response = $curl->getRawResponse();
// parser
$html = new simple_html_dom();
// load from string html
$html->load($response);
// find price and print
$price = $html->find('#price', 0)->plaintext;
echo $price;