我尝试使用 Perl Selenium::Chrome 从给定的 URL 下载整个 HTML。 我的方案是:
我运行了下面附加的代码,但无法保存它。 当我尝试“无头”模式(可见模式)时,我发现:
我怎样才能做到?
#!/usr/bin/env perl
# seleniumTest.pl
use strict;
use warnings;
use Selenium::Chrome;
use Data::Dumper;
use Selenium::Remote::WDKeys;
use Selenium::Remote::Driver;
use Selenium::ActionChains;
my $url = 'https://www.example.com/foo';
my $profile_path = '/home/cf/.config/google-chrome'; # this is to use my own google account info
my $profile_name = 'Profile 1'; # ditto
my $driver = Selenium::Chrome->new (
extra_capabilities => {
'goog:chromeOptions' => {
args => ['user-data-dir='.$profile_path, 'profile-directory='.$profile_name,
# 'headless', 'disable-gpu', 'window-size=1920,1080', 'no-sandbox' # if you want to do it headless, decomment this line
],
#binary => '/mnt/c/Users/cf/Downloads/chrome-headless-shell-linux64/chrome-headless-shell' # ditto
}
}
);
$driver->set_implicit_wait_timeout(5000);
$driver->get($url); # the browser opens if you don't set the headless mode
warn $driver->get_title(); # This works fine so I believe the selenium works
sleep 10;
warn "opened";
my $html = $driver->find_element("/html");
my $action_chains = Selenium::ActionChains->new(driver => $driver);
$action_chains->key_down( [ KEYS->{'control'}], $html); # I am not sure that it is ok to specify <html> as the element...
$action_chains->send_keys('s');
$action_chains->key_up( [ KEYS->{'control'}], $html);
sleep 10;
warn "try to save";
$action_chains->key_down( [ KEYS->{'alt'}], $html);
$action_chains->send_keys('s');
$action_chains->key_up( [ KEYS->{'alt'}], $html);
warn "saved?";
sleep 10;
$driver->shutdown_binary;
warn "ended";
如果您的目标只是获取页面源代码,
Selenium::Remote::Driver
有一个方法 get_page_source
可以为您获取 HTML,您可以将其保存到文件中:
use strict;
use warnings;
use feature 'say';
use Selenium::Chrome;
my $driver = Selenium::Chrome->new;
$driver->get('https://rawley.xyz');
open( my $fh, '>', 'page_html.html' ) or die $!;
print $fh $driver->get_page_source();
close($fh);
$driver->shutdown_binary();
LWP::UserAgent
这样的用户代理。