Perl with Selenium:无法使用 Ctrl+S 保存网页

问题描述 投票:0回答:1

我尝试使用 Perl Selenium::Chrome 从给定的 URL 下载整个 HTML。 我的方案是:

  • 打开网页
  • 按 Ctrl+S 打开“另存为”框
  • 按 Alt+S 接受默认文件位置/名称并保存

我运行了下面附加的代码,但无法保存它。 当我尝试“无头”模式(可见模式)时,我发现:

  • 浏览器打开指定网页
  • 未打开保存箱
  • 未执行任何保存操作
  • 安全关闭,没有错误,也没有保存文件

我怎样才能做到?

#!/usr/bin/env perl

# seleniumTest.pl

use strict;
use warnings;
use Selenium::Chrome;
use Data::Dumper;
use Selenium::Remote::WDKeys;
use Selenium::Remote::Driver;
use Selenium::ActionChains;

my $url = 'https://www.example.com/foo';
my $profile_path = '/home/cf/.config/google-chrome'; # this is to use my own google account info
my $profile_name = 'Profile 1'; # ditto

my $driver = Selenium::Chrome->new (
    extra_capabilities => {
    'goog:chromeOptions' => {
        args => ['user-data-dir='.$profile_path, 'profile-directory='.$profile_name,
             #   'headless', 'disable-gpu', 'window-size=1920,1080', 'no-sandbox' # if you want to do it headless, decomment this line
        ],
        #binary => '/mnt/c/Users/cf/Downloads/chrome-headless-shell-linux64/chrome-headless-shell' # ditto
    }
    }
    );

$driver->set_implicit_wait_timeout(5000);

$driver->get($url); # the browser opens if you don't set the headless mode

warn $driver->get_title(); # This works fine so I believe the selenium works

sleep 10;
warn "opened";
my $html = $driver->find_element("/html");

my $action_chains = Selenium::ActionChains->new(driver => $driver);
$action_chains->key_down( [ KEYS->{'control'}], $html); # I am not sure that it is ok to specify <html> as the element...
$action_chains->send_keys('s');
$action_chains->key_up( [ KEYS->{'control'}], $html);
sleep 10;
warn "try to save";

$action_chains->key_down( [ KEYS->{'alt'}], $html);
$action_chains->send_keys('s');
$action_chains->key_up( [ KEYS->{'alt'}], $html);
warn "saved?";
sleep 10;
$driver->shutdown_binary;
warn "ended";
perl selenium-webdriver selenium-chromedriver
1个回答
0
投票

如果您的目标只是获取页面源代码,

Selenium::Remote::Driver
有一个方法
get_page_source
可以为您获取 HTML,您可以将其保存到文件中:

use strict;
use warnings;

use feature 'say';
use Selenium::Chrome;

my $driver = Selenium::Chrome->new;

$driver->get('https://rawley.xyz');

open( my $fh, '>', 'page_html.html' ) or die $!;

print $fh $driver->get_page_source();

close($fh);

$driver->shutdown_binary();

或者,您可以使用像

LWP::UserAgent
这样的用户代理。

© www.soinside.com 2019 - 2024. All rights reserved.