通过 PHP 从 URL 下载 JSON 文件时出现 403 错误。我可以毫无问题地从浏览器打开文件(开发工具中没有错误)。
这是脚本(减去实际的 URL),我已经验证它可以与其他网站一起使用:
$ch = curl_init($url);
$dir = '../sources/';
$file_name = basename($url);
$save_file_loc = $dir . $file_name;
$fp = fopen($save_file_loc, 'wb');
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);
// curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_CAINFO, 'C:/Program \Files/php/cacert.pem');
curl_exec($ch);
echo curl_errno($ch)."\n";
print_r(curl_getinfo($ch));
curl_close($ch);
fclose($fp);
证书来自
https://curl.se/docs/caextract.html
。这是我得到的信息:
[url] => the/url/path/to/json/file
[content_type] => text/html
[http_code] => 403
[header_size] => 183
[request_size] => 83
[filetime] => -1
[ssl_verify_result] => 0
[redirect_count] => 0
[total_time] => 0.516273
[namelookup_time] => 0.065622
[connect_time] => 0.091481
[pretransfer_time] => 0.483309
[size_upload] => 0
[size_download] => 418
[speed_download] => 809
[speed_upload] => 0
[download_content_length] => 418
[upload_content_length] => 0
[starttransfer_time] => 0.516216
[redirect_time] => 0
[redirect_url] =>
[primary_ip] => (the.url.primary.ip.address)
[certinfo] => Array
(
)
[primary_port] => 443
[local_ip] => (my.local.ip)
[local_port] => 65046
[http_version] => 3
[protocol] => 2
[ssl_verifyresult] => 0
[scheme] => HTTPS
[appconnect_time_us] => 483121
[connect_time_us] => 91481
[namelookup_time_us] => 65622
[pretransfer_time_us] => 483309
[redirect_time_us] => 0
[starttransfer_time_us] => 516216
[total_time_us] => 516273
[effective_method] => GET
当然我也尝试过设置用户代理,但没有机会。
我错过了什么?
403 禁止。
服务器有问题可能阻止您下载文件。我会联系他们,首先了解您是否可以使用脚本获取文件,以及您如何能够做到这一点。
如果他们的检查特别薄弱,那么这可能会起作用,但如果不是,你会在试图欺骗他们的服务器认为你是合法的时经历很多痛苦:
$url = ''; //url to scrape
$referrer = ''; //sensible url such as the homepage of their site
$dir = '../sources/'; //directory to store file
$curl = curl_init();
$file_name = basename($url);
$save_file_loc = $dir . $file_name;
$fp = fopen($save_file_loc, 'wb');
curl_setopt($curl, CURLOPT_FILE, $fp);
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0');
$header = array();
$header[] = "Accept: */*";
$header[] = "Connection: keep-alive";
curl_setopt($curl, CURLOPT_HTTPHEADER, $header);
curl_setopt($curl, CURLOPT_REFERER, $referrer);
curl_setopt($curl, CURLOPT_AUTOREFERER, true);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT ,5);
curl_setopt($curl, CURLOPT_TIMEOUT, 10);
curl_exec($curl);
echo curl_errno($curl)."\n";
print_r(curl_getinfo($curl));
curl_close($curl);
fclose($fp);