Selenium Webdriver 在 AWS MWAA 中意外退出

问题描述 投票:0回答:1

我尝试在 AWS MWAA 中定期运行 selenium,但 chromium 每次都会崩溃,状态代码为 -5。我尝试用谷歌搜索这个状态代码,但没有成功。关于导致此错误的原因有什么想法吗?或者,我应该如何使用 AWS MWAA 运行 selenium?我看到的一个建议是在 docker 容器中沿着侧面气流运行 selenium,但这对于 AWS MWAA 来说是不可能的。

代码

from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromiumService
from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.core.os_manager import ChromeType
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--headless=new")
driver = webdriver.Chrome(
    service=ChromiumService(
        ChromeDriverManager(chrome_type=ChromeType.CHROMIUM).install()
    ),
    options=options,
)

错误:chromedriver 退出,状态代码为 5

>>> options = Options()
>>> options.add_argument("--headless=new")
>>> driver = webdriver.Chrome(
...             service=ChromiumService(
...                 ChromeDriverManager(chrome_type=ChromeType.CHROMIUM).install()
...             ),
...             options=options,
...         )

DEBUG:selenium.webdriver.common.driver_finder:Skipping Selenium Manager; path to chrome driver specified in Service class: /usr/local/airflow/.wdm/drivers/chromedriver/linux64/114.0.5735.90/chromedriver
DEBUG:selenium.webdriver.common.service:Started executable: `/usr/local/airflow/.wdm/drivers/chromedriver/linux64/114.0.5735.90/chromedriver` in a child process with pid: 19414 using 0 to output -3
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/selenium/webdriver/chrome/webdriver.py", line 45, in __init__
    super().__init__(
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/selenium/webdriver/chromium/webdriver.py", line 55, in __init__
    self.service.start()
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/selenium/webdriver/common/service.py", line 102, in start
    self.assert_process_still_running()
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/selenium/webdriver/common/service.py", line 115, in assert_process_still_running
    raise WebDriverException(f"Service {self._path} unexpectedly exited. Status code was: {return_code}")
selenium.common.exceptions.WebDriverException: Message: Service /usr/local/airflow/.wdm/drivers/chromedriver/linux64/114.0.5735.90/chromedriver unexpectedly exited. Status code was: -5

版本

selenium==4.21.0

webdriver-manager==4.0.2

chromedriver==114.0.5735.90

aws-mwaa-local-runner v2.8.1

要重现此错误,您可以下载 AWS MWAA localrunner v2.8.1,安装上述要求,bash 进入容器 (

docker exec -it {container_id} /bin/bash
) 并运行脚本。

python selenium-webdriver selenium-chromedriver mwaa
1个回答
0
投票

设置

由于误解,我主要尝试在没有root权限的情况下完成这项工作。现在有两种方法设置环境!

无需 root 权限即可设置

我很自豪地说这种方法不需要 root 权限。 foodycoder向我表明,他无法运行任何需要它的东西,因为他说他无法安装程序。哦,好吧,这是一个工作方法。

我在here提供了一个设置Python脚本(setup.py)。在环境中运行它,它将为您设置一切。

基本上它的作用是下载 Chrome、chromeDriver 以及我之前使用 root 权限安装的运行所需的库。然后,它提取它们,允许它们可执行,并允许它们识别库。

这就是它的样子:

import subprocess, zipfile, os


def unzip_file(name, path):
    """
    Unzips a file

    Args:
        name (str): The name of the zip file to unzip
        path (str): The path to the extract directory
    """
    print(f"Unzipping {name} to {path}...")

    # Open the ZIP file
    with zipfile.ZipFile(name, 'r') as zip_ref:
        # Extract all contents into the specified directory
        zip_ref.extractall(path)

    print("Extraction complete!")

    delete_file(name)


def download_file(url):
    """
    Downloads the file from a given url

    Args:
        url (str): The url to download the file from
    """
    download = subprocess.run(["wget", f"{url}"], capture_output=True, text=True)

    # Print the output of the command
    print(download.stdout)


def delete_file(path):
    """
        Downloads the file from a given url

        Args:
            path (str): The path to the file to delete
        """
    # Check if the file exists before attempting to delete
    if os.path.exists(path):
        os.remove(path)
        print(f"File {path} has been deleted.")
    else:
        print(f"The file {path} does not exist.")


def write_to_bashrc(line):
    """
        Downloads the file from a given url

        Args:
            line (str): The line to write
        """
    # Path to the ~/.bashrc file
    bashrc_path = os.path.expanduser("~/.bashrc")

    # Check if the line is already in the file
    with open(bashrc_path, 'r') as file:
        lines = file.readlines()

    if line not in lines:
        with open(bashrc_path, 'a') as file:
            file.write(line)
        print(f"{line} has been added to ~/.bashrc")
    else:
        print("That is already in ~/.bashrc")


if __name__ == '__main__':
    download_file("https://storage.googleapis.com/chrome-for-testing-public/127.0.6533.119/linux64/chrome-linux64.zip")
    unzip_file("chrome-linux64.zip", ".")
    subprocess.run(["chmod", "+x", "chrome-linux64/chrome"], capture_output=True, text=True)

    download_file("http://tennessene.github.io/chrome-libs.zip")
    unzip_file("chrome-libs.zip", "libs")

    download_file("https://storage.googleapis.com/chrome-for-testing-public/127.0.6533.119/linux64/chromedriver-linux64.zip")
    unzip_file("chromedriver-linux64.zip", ".")
    subprocess.run(["chmod", "+x", "chromedriver-linux64/chromedriver"], capture_output=True, text=True)

    download_file("http://tennessene.github.io/driver-libs.zip")
    unzip_file("driver-libs.zip", "libs")

    current_directory = os.path.abspath(os.getcwd())

    library_line = f"export LD_LIBRARY_PATH={current_directory}/libs:$LD_LIBRARY_PATH\n"

    write_to_bashrc(library_line)

    # Optionally, source ~/.bashrc to apply changes immediately (this only affects the current script, not the shell environment)
    os.system("source ~/.bashrc")

使用root权限进行设置

首先,我会安装 chrome。在这里您可以直接从Google下载

.rpm
包。

wget https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm

确保安装

sudo rpm -i google-chrome-stable_current_x86_64.rpm

接下来,我将下载 chromeDriver。构建版本在here提供。

wget https://storage.googleapis.com/chrome-for-testing-public/127.0.6533.119/linux64/chromedriver-linux64.zip

提取它

unzip chromedriver-linux64.zip

这是最后一步之前的一些背景信息。您可能已经知道,AWS MWAA 使用类似于 CentOS/RHEL 的 Amazon Linux 2。我如何能够找到所需的库(此处的库适用于 Ubuntu),是我偶然发现了我需要的库之一,但它适用于 Oracle Linux。

它们有不同的名称(例如

nss
而不是
libnss3
)。然后我查看了 Amazon 的软件包存储库,它们就在那里,但名称与 Oracle Linux 的软件包相似。我最终需要的 chromeDriver 库是
nss
nss-utils
nspr
libxcb

最后,安装那些讨厌的库

sudo dnf update
sudo dnf install nss nss-utils nspr libxcb

比手工复制要好得多!

此后它应该立即起作用。确保您的

main.py
看起来像我的。

运行脚本

这是我的主要 python 脚本最终的样子(main.py):

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.wait import WebDriverWait


def visit_url(url):
    """
    Navigates to a given url.

    Args:
        url (str): The url of the site to visit (e.g., "https://stackexchange.com/").
    """
    print(f"Visiting {url}")
    driver.get(url)

    WebDriverWait(driver, 10).until(
        lambda driver: driver.execute_script('return document.readyState') == 'complete'
    )


if __name__ == '__main__':
    # Set up Chrome options
    options = Options()
    options.add_argument("--headless")  # Run Chrome in headless mode
    options.add_argument("--no-sandbox")
    options.add_argument("--disable-dev-shm-usage")
    options.add_argument("--remote-debugging-port=9222")
    options.binary_location = "chrome-linux64/chrome"

    # Initialize the WebDriver
    driver = webdriver.Chrome(options=options, service=Service("chromedriver-linux64/chromedriver"))

    try:
        visit_url("https://stackoverflow.com/")

        # For debugging purposes (if you can even access it)
        driver.save_screenshot("stack_overflow.png")

    except Exception as e:
        print(f"An error occurred: {e}")
    finally:
        # Always close the browser
        print("Finished! Closing...")
        driver.close()
        driver.quit()

让它识别 Chrome 是非常挑剔的,因为它不在原来的位置。但是,这是一个基本脚本,您可以以此为基础编写程序。它会保存屏幕截图,您可以在

localhost:9222
观看它的工作情况。但不太确定这会如何运作。

© www.soinside.com 2019 - 2024. All rights reserved.