Selenium 在 Google Colab 中运行时无法检索 url

问题描述 投票:0回答:1

我构建了一个小型网络抓取工具,过去几个月在 Google Colab 中成功运行。它从 CMS 网站下载一组计费代码。最近,驱动程序在检索某些但不是全部 URL 时开始抛出超时异常。当我在本地运行下面的代码片段时,它会成功执行。它尝试从两个 url 下载文件,但尝试检索第二个 url 失败。

from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait


def download_documents() -> None:
    """Download billing code documents from CMS"""

    chrome_options = Options()
    chrome_options.add_argument("--headless")
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--disable-dev-shm-usage")
    driver = webdriver.Chrome(options=chrome_options)

    working_url = "https://www.cms.gov/medicare-coverage-database/view/article.aspx?articleid=59626&ver=6"
    not_working_url = "https://www.cms.gov/medicare-coverage-database/view/lcd.aspx?lcdid=36377&ver=19"

    for row in [working_url, not_working_url]:
        print(f"Retrieving from {row}...")
        driver.get(row) # Fails on second url

        print("Wait for webdriver...")
        wait = WebDriverWait(driver, 2)

        print("Attempting license accept...")
        # Accept license
        try:
            wait.until(EC.element_to_be_clickable((By.ID, "btnAcceptLicense"))).click()
        except TimeoutException:
            pass
        wait = WebDriverWait(driver, 4)
        print("Attempting pop up close...")
        # Click on Close button of the second pop-up
        try:
            wait.until(
                EC.element_to_be_clickable(
                    (
                        By.XPATH,
                        "//button[@data-page-action='Clicked the Tracking Sheet Close button.']",
                    )
                )
            ).click()
        except TimeoutException:
            pass
        print("Attempting download...")
        driver.find_element(By.ID, "btnDownload").click()

download_documents()

预期行为:上面的代码在 Google Colab 中成功运行,就像在本地一样。

一个潜在的相关问题:Google Colab 中的 Selenium TimeoutException

python selenium-webdriver selenium-chromedriver google-colaboratory google-notebook
1个回答
0
投票

尝试以下这些论点:

    
   chrome_options = Options()
    chrome_options.add_argument("--headless")
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--disable-dev-shm-usage")
    chrome_options.add_argument("--disable-blink-features=AutomationControlled")
    chrome_options.add_argument(
        "user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
    )
© www.soinside.com 2019 - 2024. All rights reserved.