防止selenium chrome(Remote WebDriver)+ NetworkInterceptor将http升级为https

问题描述 投票:0回答:1

我正在使用带有节点铬实例的 selenium hub,如下所示:

version: '3.8'
name: 'selenium-feature-extractor-dev'
services:
  selenium-hub:
    image: seleniarm/hub:4.18.1-20240327
    ports:
      - "4444:4444"
  selenium-chrome:
    image: seleniarm/node-chromium:123.0-chromedriver-123.0-grid-4.18.1-20240327
    shm_size: 2gb
    depends_on:
      - selenium-hub
    environment:
      - SE_EVENT_BUS_HOST=selenium-hub
      - SE_EVENT_BUS_PUBLISH_PORT=4442
      - SE_EVENT_BUS_SUBSCRIBE_PORT=4443
      - SE_NODE_MAX_SESSIONS=10
      - SE_OPTS=--grid-url http://localhost:4444

我还使用 NetworkInterceptor 来强制执行某些标头,如下所示:

    try (NetworkInterceptor ignored = addHeadersToRequests(driver, headers)) {
       ...
    }

    private NetworkInterceptor addHeadersToRequests(WebDriver driver, Map<String, String> headers) {
        if (driver instanceof HasDevTools && headers != null) {
            Filter filter = next -> req -> {
                headers.forEach((key, value) -> {
                    req.removeHeader(key);
                    req.addHeader(key, value);
                });
                return next.execute(req);
            };
            return new NetworkInterceptor(driver, filter);
        }
        return null;
    }

当我请求一个 http 站点时,在某个时刻它会升级为 https 请求,并返回一个空的 html 页面。

我第一次在日志中看到这个是在这里:

2024-04-03T15:51:27.558+01:00  INFO 26651 --- [ CDP Connection] o.openqa.selenium.devtools.Connection    : <- {"method":"Fetch.requestPaused","params":{"requestId":"interception-job-1.0","request":{"url":"https://mysite.pro/","method":"GET","headers":{"Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7","Accept-Language":"en-GB,en-US;q=0.9,en;q=0.8","Upgrade-Insecure-Requests":"1","User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36"},"initialPriority":"VeryHigh","referrerPolicy":"strict-origin-when-cross-origin"},"frameId":"D3EC8B769B07AD51B5752B137EF1916B","resourceType":"Document"},"sessionId":"B0A124EC0331A37418F1136BB92D1F32"}
2024-04-03T15:51:27.558+01:00  INFO 26651 --- [ CDP Connection] o.openqa.selenium.devtools.Connection    : Method Fetch.requestPaused called with 2 callbacks available
2024-04-03T15:51:27.559+01:00  INFO 26651 --- [ CDP Connection] o.openqa.selenium.devtools.Connection    : Matching Fetch.requestPaused with Fetch.requestPaused
2024-04-03T15:51:27.563+01:00  INFO 26651 --- [ CDP Connection] o.openqa.selenium.devtools.Connection    : Calling callback for Fetch.requestPaused using org.openqa.selenium.devtools.idealized.Network$$Lambda/0x00000003017dd230@21408fe2 being passed org.openqa.selenium.devtools.v122.fetch.model.RequestPaused@1b7b7141
2

如果我使用 NetworkInterceptor 来应用标头,则请求不会升级 - 它会正确请求并检索 http 站点。

这让我相信它与 NetworkInterceptor 有关,而不是任何其他组件。

如何阻止请求升级?并非每个网站都使用 https,我不希望别人为我做出这个决定。

信息:我已将 ChromeOptions 设置为允许不安全的证书以及这些开关:

            "--disable-dev-shm-usage",
            "--ignore-certificate-errors",
            "--disable-blink-features=AutomationControlled",
            "--log-level=DEBUG",
            "--whitelisted-ips",
            "--disable-gpu",
            "--disable-software-rasterizer",
            "--verbose",
            "--remote-allow-origins=*",
            "--ignore-urlfetcher-cert-requests",
            "--allow-running-insecure-content"
google-chrome selenium-webdriver
1个回答
0
投票

事实证明这是 selenium 4.18.1 的一个错误

根据我们这边修复该问题(/创建解决方法)的开发人员的说法:

Selenium 不会正确处理未能获得响应(例如超时或连接失败)的请求,而是向 Chrome 返回 200 响应,以便它错误地继续处理页面。 解决方法是拦截处理并返回一个特殊的 PROCEED_WITH_REQUEST 响应,这会导致 Selenium 在将响应返回给 Chrome 之前不会覆盖该响应。这使得 Chrome 的行为就像未启用网络拦截一样。


    private NetworkInterceptor addHeadersToRequests(WebDriver driver, Map<String, String> headers) {
        if (driver instanceof HasDevTools && headers != null) {
            Filter filter = next -> req -> {
                headers.forEach((key, value) -> {
                    req.removeHeader(key);
                    req.addHeader(key, value);
                });
                return next.execute(req);
            };
            // FIXME: We should be using NetworkInterceptor(driver, filter) here but can't until Selenium fix their bug
            DevTools tools = ((HasDevTools) driver).getDevTools();
            tools.createSessionIfThereIsNotOne(driver.getWindowHandle());
            BugFixedNetwork network = new BugFixedNetwork(tools);
            network.interceptTrafficWith(filter);
        }
        return null;
    }

// This is a workaround for a bug in the Selenium 4.18.1, which doesn't handle requests that fail to connect
// e.g. due to DNS failures or timeouts.
private static class BugFixedNetwork extends v122Network {

    public BugFixedNetwork(DevTools devTools) {
        super(devTools);
    }

    @Override
    public Either<HttpRequest, HttpResponse> createSeMessages(RequestPaused pausedReq) {
        if (pausedReq.getResponseErrorReason().isPresent() && pausedReq.getResponseStatusCode().isEmpty()) {
            return Either.right(PROCEED_WITH_REQUEST);
        }
        return super.createSeMessages(pausedReq);
    }
}
© www.soinside.com 2019 - 2024. All rights reserved.