我正在使用带有节点铬实例的 selenium hub,如下所示:
version: '3.8'
name: 'selenium-feature-extractor-dev'
services:
selenium-hub:
image: seleniarm/hub:4.18.1-20240327
ports:
- "4444:4444"
selenium-chrome:
image: seleniarm/node-chromium:123.0-chromedriver-123.0-grid-4.18.1-20240327
shm_size: 2gb
depends_on:
- selenium-hub
environment:
- SE_EVENT_BUS_HOST=selenium-hub
- SE_EVENT_BUS_PUBLISH_PORT=4442
- SE_EVENT_BUS_SUBSCRIBE_PORT=4443
- SE_NODE_MAX_SESSIONS=10
- SE_OPTS=--grid-url http://localhost:4444
我还使用 NetworkInterceptor 来强制执行某些标头,如下所示:
try (NetworkInterceptor ignored = addHeadersToRequests(driver, headers)) {
...
}
private NetworkInterceptor addHeadersToRequests(WebDriver driver, Map<String, String> headers) {
if (driver instanceof HasDevTools && headers != null) {
Filter filter = next -> req -> {
headers.forEach((key, value) -> {
req.removeHeader(key);
req.addHeader(key, value);
});
return next.execute(req);
};
return new NetworkInterceptor(driver, filter);
}
return null;
}
当我请求一个 http 站点时,在某个时刻它会升级为 https 请求,并返回一个空的 html 页面。
我第一次在日志中看到这个是在这里:
2024-04-03T15:51:27.558+01:00 INFO 26651 --- [ CDP Connection] o.openqa.selenium.devtools.Connection : <- {"method":"Fetch.requestPaused","params":{"requestId":"interception-job-1.0","request":{"url":"https://mysite.pro/","method":"GET","headers":{"Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7","Accept-Language":"en-GB,en-US;q=0.9,en;q=0.8","Upgrade-Insecure-Requests":"1","User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36"},"initialPriority":"VeryHigh","referrerPolicy":"strict-origin-when-cross-origin"},"frameId":"D3EC8B769B07AD51B5752B137EF1916B","resourceType":"Document"},"sessionId":"B0A124EC0331A37418F1136BB92D1F32"}
2024-04-03T15:51:27.558+01:00 INFO 26651 --- [ CDP Connection] o.openqa.selenium.devtools.Connection : Method Fetch.requestPaused called with 2 callbacks available
2024-04-03T15:51:27.559+01:00 INFO 26651 --- [ CDP Connection] o.openqa.selenium.devtools.Connection : Matching Fetch.requestPaused with Fetch.requestPaused
2024-04-03T15:51:27.563+01:00 INFO 26651 --- [ CDP Connection] o.openqa.selenium.devtools.Connection : Calling callback for Fetch.requestPaused using org.openqa.selenium.devtools.idealized.Network$$Lambda/0x00000003017dd230@21408fe2 being passed org.openqa.selenium.devtools.v122.fetch.model.RequestPaused@1b7b7141
2
如果我不使用 NetworkInterceptor 来应用标头,则请求不会升级 - 它会正确请求并检索 http 站点。
这让我相信它与 NetworkInterceptor 有关,而不是任何其他组件。
如何阻止请求升级?并非每个网站都使用 https,我不希望别人为我做出这个决定。
信息:我已将 ChromeOptions 设置为允许不安全的证书以及这些开关:
"--disable-dev-shm-usage",
"--ignore-certificate-errors",
"--disable-blink-features=AutomationControlled",
"--log-level=DEBUG",
"--whitelisted-ips",
"--disable-gpu",
"--disable-software-rasterizer",
"--verbose",
"--remote-allow-origins=*",
"--ignore-urlfetcher-cert-requests",
"--allow-running-insecure-content"
事实证明这是 selenium 4.18.1 的一个错误
根据我们这边修复该问题(/创建解决方法)的开发人员的说法:
Selenium 不会正确处理未能获得响应(例如超时或连接失败)的请求,而是向 Chrome 返回 200 响应,以便它错误地继续处理页面。 解决方法是拦截处理并返回一个特殊的 PROCEED_WITH_REQUEST 响应,这会导致 Selenium 在将响应返回给 Chrome 之前不会覆盖该响应。这使得 Chrome 的行为就像未启用网络拦截一样。
private NetworkInterceptor addHeadersToRequests(WebDriver driver, Map<String, String> headers) {
if (driver instanceof HasDevTools && headers != null) {
Filter filter = next -> req -> {
headers.forEach((key, value) -> {
req.removeHeader(key);
req.addHeader(key, value);
});
return next.execute(req);
};
// FIXME: We should be using NetworkInterceptor(driver, filter) here but can't until Selenium fix their bug
DevTools tools = ((HasDevTools) driver).getDevTools();
tools.createSessionIfThereIsNotOne(driver.getWindowHandle());
BugFixedNetwork network = new BugFixedNetwork(tools);
network.interceptTrafficWith(filter);
}
return null;
}
// This is a workaround for a bug in the Selenium 4.18.1, which doesn't handle requests that fail to connect
// e.g. due to DNS failures or timeouts.
private static class BugFixedNetwork extends v122Network {
public BugFixedNetwork(DevTools devTools) {
super(devTools);
}
@Override
public Either<HttpRequest, HttpResponse> createSeMessages(RequestPaused pausedReq) {
if (pausedReq.getResponseErrorReason().isPresent() && pausedReq.getResponseStatusCode().isEmpty()) {
return Either.right(PROCEED_WITH_REQUEST);
}
return super.createSeMessages(pausedReq);
}
}