Chrome驱动程序版本:2.41 Chrome版本:69.0.3497.92
这是我的代码向一个带有异常处理的webdriver发送多个请求:
from selenium import webdriver
from selenium.common.exceptions import *
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
driver = webdriver.Chrome('/usr/local/bin/chromedriver', chrome_options=options)
driver.set_page_load_timeout(30)
for link in links:
try:
driver.get(link)
except TimeoutException as e:
# do something
continue
except Exception as e:
# do some other thing
continue
预期的行为是如果抛出TimeoutException,我会继续向下一个链接发出请求,依此类推。但是,我得到的是当一个TimeoutException发生时,所有其他链接也会抛出TimeoutExceptions。
这是chrome的记录器的相关日志。
[1536872569.507][SEVERE]: Timed out receiving message from renderer: 29.449 [1536872569.509][INFO]: Timed out. Stopping navigation... [1536872569.509][DEBUG]: DEVTOOLS COMMAND Page.stopLoading (id=1243) { } [1536872569.509][DEBUG]: DEVTOOLS RESPONSE Page.stopLoading (id=1243) { } [1536872569.509][DEBUG]: DEVTOOLS COMMAND Runtime.evaluate (id=1244) { "expression": "1" } [1536872569.510][SEVERE]: Timed out receiving message from renderer: -0.002 [1536872569.513][INFO]: Done waiting for pending navigations. Status: timeout [1536872569.513][INFO]: RESPONSE Navigate timeout (Session info: headless chrome=69.0.3497.92) [1536872569.516][INFO]: COMMAND Navigate { "sessionId": "9caf0bad68147065f14c9c22632cd6d8", "url": "www.example.com" } [1536872569.516][DEBUG]: DEVTOOLS EVENT Page.frameStoppedLoading { "frameId": "620369B66F0605C0CE359F34F9D95E36" } [1536872569.516][DEBUG]: DEVTOOLS RESPONSE Runtime.evaluate (id=1244) { "result": { "description": "1", "type": "number", "value": 1 } } [1536872569.516][INFO]: Waiting for pending navigations... [1536872569.516][DEBUG]: DEVTOOLS COMMAND Runtime.evaluate (id=1245) { "expression": "1" } [1536872569.517][DEBUG]: DEVTOOLS RESPONSE Runtime.evaluate (id=1245) { "result": { "description": "1", "type": "number", "value": 1 } } [1536872599.516][SEVERE]: Timed out receiving message from renderer: 30.000 [1536872599.518][INFO]: Timed out. Stopping navigation... [1536872599.518][DEBUG]: DEVTOOLS COMMAND Page.stopLoading (id=1246) { } [1536872599.518][DEBUG]: DEVTOOLS RESPONSE Page.stopLoading (id=1246) { } [1536872599.518][DEBUG]: DEVTOOLS COMMAND Runtime.evaluate (id=1247) { "expression": "1" } [1536872599.518][SEVERE]: Timed out receiving message from renderer: -0.002 [1536872599.522][INFO]: Done waiting for pending navigations. Status: timeout [1536872599.522][INFO]: RESPONSE Navigate timeout (Session info: headless chrome=69.0.3497.92) [1536872599.524][INFO]: COMMAND Navigate { "sessionId": "9caf0bad68147065f14c9c22632cd6d8", "url": "www.example2.com" }
以下是我将此事件与没有任何异常的其他后续请求进行比较时发现的差异。
1)DEVTOOLS EVENT Page.frameStoppedLoading
在向新的“www.example.com”链接发送请求后立即出现。
2)在请求到新URL之后,记录从前一个链接发送的对DEVTOOLS COMMAND Runtime.evaluate (id=1244)
的响应。
问题:除了每次TimeoutException重启驱动程序之外,还有其他方法可以处理吗?
如果有人也可以详细说明这种行为,我会非常感激。谢谢。
通过进一步阅读日志,我意识到立即尝试发送另一个请求会导致请求根本不被发送。我在原始帖子中做出的两个观察结果是成功的请求,所以你可以忽略它。
这是比较超时异常处理后成功连续请求与连续请求的日志。
当chrome驱动程序启动时,浏览器会话将获得一个id(以后称为frameId)。
[1536915601.693][DEBUG]: DevTools request: http://localhost:34899/json [1536915601.694][DEBUG]: DevTools response: [ { "description": "", "devtoolsFrontendUrl": "/devtools/inspector.html?ws=localhost:34899/devtools/page/A417CC5AE2C87A4D0FC64CF66B54ED72", "id": "A417CC5AE2C87A4D0FC64CF66B54ED72", "title": "data:,", "type": "page", "url": "data:,", "webSocketDebuggerUrl": "ws://localhost:34899/devtools/page/A417CC5AE2C87A4D0FC64CF66B54ED72" } ]
现在案例1:成功响应后的正常请求:
[1536915607.033][INFO]: Done waiting for pending navigations. Status: ok [1536915607.033][INFO]: RESPONSE GetSource "\u003C!DOCTYPE html>\u003Chtml xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"ko\">\u003Chead>\u003Cmeta http-equiv=\"Content-Type\" content=\"text/h tml; charset=utf-8\" />\n\u003Cmeta name=\"viewport\" content=\"width=device-width, in..." [1536915607.044][INFO]: COMMAND Navigate { "sessionId": "d11fb86ec1b49a141f99fe1ec4286a85", "url": "http://www.gelloy.com/product/detail.html?product_no=438&cate_no=30&display_group=1" } # ------ skip for concisiveness ----- # [1536915607.044][INFO]: Done waiting for pending navigations. Status: ok [1536915607.044][DEBUG]: DEVTOOLS COMMAND Page.navigate (id=49) { "url": "http://www.gelloy.com/product/detail.html?product_no=438&cate_no=30&display_group=1" } [1536915609.244][DEBUG]: DEVTOOLS RESPONSE Page.navigate (id=49) { "frameId": "A417CC5AE2C87A4D0FC64CF66B54ED72", "loaderId": "0EB53CDA615428AA73A9DB67F5FF65E1" }
在这里,我可以看到
- COMMAND Navigate
- 准备下一个请求
- COMMAND Page.navigate
- 发出请求
- RESPONSE Page.navigate
- 在开头给出frameId返回
VS
案例2:触发超时后立即发送的请求:
[1536872569.513][INFO]: Done waiting for pending navigations. Status: timeout [1536872569.513][INFO]: RESPONSE Navigate timeout (Session info: headless chrome=69.0.3497.92) [1536872569.516][INFO]: COMMAND Navigate { "sessionId": "9caf0bad68147065f14c9c22632cd6d8", "url": "www.example.com" } [1536872569.516][DEBUG]: DEVTOOLS EVENT Page.frameStoppedLoading { "frameId": "620369B66F0605C0CE359F34F9D95E36" } [1536872569.516][DEBUG]: DEVTOOLS RESPONSE Runtime.evaluate (id=1244) { "result": { "description": "1", "type": "number", "value": 1 } } [1536872569.516][INFO]: Waiting for pending navigations... [1536872569.516][DEBUG]: DEVTOOLS COMMAND Runtime.evaluate (id=1245) { "expression": "1" } [1536872569.517][DEBUG]: DEVTOOLS RESPONSE Runtime.evaluate (id=1245) { "result": { "description": "1", "type": "number", "value": 1 } } [1536872599.516][SEVERE]: Timed out receiving message from renderer: 30.000
然而,在超时后,我看到COMMAND Navigate
与下一个url得到,但COMMAND Page.navigate
永远不会发生。因此,当从创建COMMAND Navigate
起30秒后,驱动程序将根据最新的RESPONSE Page.navigate
的结果确定页面是否已加载,从而导致超时。
driver.quit()
and reopen a new browser every time a Timeout Exception occurs. Putting a time.sleep(1)
before continuing with the loop also seems to work, but I can't be certain that 1 second is enough.
这是我更新的代码:
driver = webdriver.Chrome('/usr/local/bin/chromedriver', chrome_options=options)
driver.set_page_load_timeout(30)
for link in links:
try:
driver.get(link)
except TimeoutException as e:
# do something
driver.quit()
driver = webdriver.Chrome('/usr/local/bin/chromedriver', chrome_options=options)
driver.set_page_load_timeout(30)
continue
except Exception as e:
# do some other thing
continue