我需要在 ubuntu 服务器 22 中使用 cron 下载此 xml https://www.sbs.gob.pe/app/xmltipocambio/TC_TI_Portal_xml.xml
我尝试使用带有 cookies 和 headers 的 PHP
<?php
$curl = curl_init();
curl_setopt_array($curl, array(
CURLOPT_URL => 'https://www.sbs.gob.pe/app/xmltipocambio/TC_TI_Portal_xml.xml',
CURLOPT_RETURNTRANSFER => true,
CURLOPT_ENCODING => '',
CURLOPT_MAXREDIRS => 10,
CURLOPT_TIMEOUT => 0,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
CURLOPT_CUSTOMREQUEST => 'GET',
CURLOPT_HTTPHEADER => array(
'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36',
'Acept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language: es-ES,es;q=0.9,en;q=0.8',
'Connection: keep-alive',
'Cookie: incap_ses_1619_2355492=YY1MbHjuL29nrYNZGNh3FgwXMGcAAAAAKiPwXriZvD1CjH20JCrT4Q==; visid_incap_2355492=JXwUES73Q0iJUC7yP8fz7yqML2cAAAAAQUIPAAAAAACnDL7+W+pMrttP8iSVofb9; TS01fc2e41=019955ae164e95aa0ff72e02668fc97902a191a5240bc7a84d3b8766d1ca6464a1f5743a921ca51bef417b65cb4ad5de3b1ce9ec19'
),
));
$response = curl_exec($curl);
curl_close($curl);
echo $response;
但回应是:
<html style="height:100%">
<head>
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
<meta name="format-detection" content="telephone=no">
<meta name="viewport" content="initial-scale=1.0">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
</head>
<body style="margin:0px;height:100%"><iframe id="main-iframe"
src="/_Incapsula_Resource?CWUDNSAI=27&xinfo=9-25703392-0%200NNN%20RT%281731205587990%2095%29%20q%280%20-1%20-1%20-1%29%20r%280%20-1%29&incident_id=0-134309180844737737&edet=9&cinfo=ffffffff&rpinfo=0&mth=GET"
frameborder=0 width="100%" height="100%" marginheight="0px" marginwidth="0px">Request unsuccessful. Incapsula
incident ID: 0-134309180844737737</iframe></body>
</html>
正确答案一定是:
<?xml version="1.0" encoding="utf-8"?>
<tipocambio>
<linktc>http://www.sbs.gob.pe/principal/categoria/tipo-de-cambio/147/c-147</linktc>
<linktilegal>http://www.sbs.gob.pe/principal/categoria/tasa-de-interes-legal/155/c-155</linktilegal>
<linktipromedio>http://www.sbs.gob.pe/principal/categoria/tasa-de-interes-promedio/154/c-154</linktipromedio>
<fecha>08/11/2024</fecha>
<moneda>$</moneda>
<compra>3.764</compra>
<venta>3.769</venta>
</tipocambio>
嘿,我检查了这个问题,我必须特别对 Incapsula 进行一些研究,您遇到的问题可能是由于他们的网络安全服务而发生的。您的脚本可能会被阻止,因为他们的网络安全已将您的脚本标记为威胁。
因此,您可以尝试在脚本中添加用户代理标头:
<?php
$url = 'https://www.sbs.gob.pe/app/xmltipocambio/TC_TI_Portal_xml.xml';
$userAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36';
$ch = curl_init();
curl_setopt($ch,
CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_USERAGENT,
$userAgent);
$output = curl_exec($ch);
curl_close($ch);
file_put_contents('downloaded_xml.xml', $output);
echo 'Downloaded XML saved to downloaded_xml.xml';
?>
将此脚本保存为 download_xml.php 并使用 php download_xml.php 从命令行运行它,并在您的 cron 作业中直接调用 php download_xml.php