如何使用 iPad 的用户代理抓取网站?
我在 PHP 中使用了以下代码,它输出源代码,但仍然找不到标签。在使用 Ipad 用户代理的 Ipad 或 Safari 浏览器上,加载网站时会显示标签。
谢谢!
<?php
$useragent= "Mozilla/5.0 (iPad; U; CPU OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B334b Safari/531.21.10')";
$ch = curl_init ("http://www.cbsnews.com/video/watch/?id=7370279n&tag=mg;mostpopvideo");
curl_setopt ($ch, CURLOPT_USERAGENT, $useragent); // set user agent
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
// curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
echo $output = curl_exec ($ch);
curl_close($ch);
?>
尝试从命令行使用curl,并使用如下的perl脚本:
my $ua = "Mozilla/5.0 (iPad; U; CPU OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B334b Safari/531.21.10";
my $curl = "curl -A '$ua'";
my $server = "http://www.cbsnews.com";
my $startpage = "$server/video/watch/?id=7370279n&tag=mg;mostpopvideo";
my $path = "/path/to/download/to";
open(f, "$curl -L $startpage |") or die "Cannot open website: $!";
while (<f>)
{
if (/<a\s+[^>]*href=\"$server\/([^\"\/])*\"/)
{
my $file = $2;
system("$curl -e $startpage $server/$file > $path/$file");
next;
}
if (/<a\s+[^>]*href=\"$server\/([^\"]+)\/([^\"\/])*\"/)
{
my $folder = $1;
my $file = "$folder/$2";
system("mkdir -p $path/$folder");
system("$curl -e $startpage $server/$file > $path/$file");
next;
}
}
close(f);