我无法让 Selenium Chrome 在 Docker 中使用 Python 工作

问题描述 投票:0回答:2

我有一个经典的“它可以在我的机器上运行”问题,我在笔记本电脑上成功运行了一个网络抓取工具,但每当我尝试在容器中运行它时,都会出现持续错误。

我的最小可重现的 dockerized 示例由以下文件组成:

需求.txt:

selenium==4.23.1  # 4.23.1
pandas==2.2.2
pandas-gbq==0.22.0
tqdm==4.66.2

Dockerfile:

FROM selenium/standalone-chrome:latest

# Set the working directory in the container
WORKDIR /usr/src/app

# Copy your application files
COPY . .

# Install Python and pip
USER root
RUN apt-get update && apt-get install -y python3 python3-pip python3-venv

# Create a virtual environment
RUN python3 -m venv /usr/src/app/venv

# Activate the virtual environment and install dependencies
RUN . /usr/src/app/venv/bin/activate && \
    pip install --no-cache-dir -r requirements.txt

# Switch back to the selenium user
USER seluser

# Set the entrypoint to activate the venv and run your script
CMD ["/bin/bash", "-c", "source /usr/src/app/venv/bin/activate && python -m scrape_ev_files"]

scrape_ev_files.py(精简为重现错误所需的内容):

import os
from selenium import webdriver
from selenium.webdriver.chrome.service import Service


def init_driver(local_download_path):
    os.makedirs(local_download_path, exist_ok=True)

    # Set Chrome Options    
    chrome_options = Options()
    chrome_options.add_argument("--headless")
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--disable-dev-shm-usage")
    chrome_options.add_argument("--remote-debugging-port=9222")

    prefs = {
        "download.default_directory": local_download_path,
        "download.prompt_for_download": False,
        "download.directory_upgrade": True,
        "safebrowsing.enabled": True
    }
    chrome_options.add_experimental_option("prefs", prefs)

    # Set up the driver
    service = Service()

    chrome_options = Options()
    driver = webdriver.Chrome(service=service, options=chrome_options)

    # Set download behavior
    driver.execute_cdp_cmd("Page.setDownloadBehavior", {
        "behavior": "allow",
        "downloadPath": local_download_path
    })

    return driver

if __name__ == "__main__":
    # PARAMS
    ELECTION = '2024 MARCH 5TH DEMOCRATIC PRIMARY'
    ORIGIN_URL = "https://earlyvoting.texas-election.com/Elections/getElectionDetails.do"
    CSV_DL_DIR = "downloaded_files"

    # initialize the driver
    driver = init_driver(local_download_path=CSV_DL_DIR)

shell 命令重现错误:

docker build -t my_scraper .  # (no error)
docker run --rm -t my_scraper # (error)

错误的堆栈跟踪如下。任何帮助将不胜感激!我已经尝试了我的requirements.txt和Dockerfile的多次迭代,试图解决这个问题,但这个错误在这个地方一直令人沮丧地持续存在:

  File "/workspace/scrape_ev_files.py", line 110, in <module>
    driver = init_driver(local_download_path=CSV_DL_DIR)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/scrape_ev_files.py", line 47, in init_driver
    driver = webdriver.Chrome(service=service, options=chrome_options)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/.venv/lib/python3.12/site-packages/selenium/webdriver/chrome/webdriver.py", line 45, in __init__
    super().__init__(
  File "/workspace/.venv/lib/python3.12/site-packages/selenium/webdriver/chromium/webdriver.py", line 66, in __init__
    super().__init__(command_executor=executor, options=options)
  File "/workspace/.venv/lib/python3.12/site-packages/selenium/webdriver/remote/webdriver.py", line 212, in __init__
    self.start_session(capabilities)
  File "/workspace/.venv/lib/python3.12/site-packages/selenium/webdriver/remote/webdriver.py", line 299, in start_session
    response = self.execute(Command.NEW_SESSION, caps)["value"]
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/.venv/lib/python3.12/site-packages/selenium/webdriver/remote/webdriver.py", line 354, in execute
    self.error_handler.check_response(response)
  File "/workspace/.venv/lib/python3.12/site-packages/selenium/webdriver/remote/errorhandler.py", line 229, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.SessionNotCreatedException: Message: session not created: Chrome failed to start: exited normally.
  (session not created: DevToolsActivePort file doesn't exist)
  (The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
python docker google-chrome selenium-webdriver web-scraping
2个回答
0
投票

您在将

chrome_options
变量发送到
webdriver.Chrome()
之前重写它,因此没有定义任何选项,
--disable-dev-shm-usage
(此选项解决了该问题),特别是。

只需在驱动程序初始化之前删除

chrome_options = Options()
即可。

作为旁注,请考虑使用

--headless=new
而不是
--headless
它提供的功能更接近常规 chrome,并且
--headless
将在未来版本中弃用。


-1
投票

我不确定这是否是问题所在,但你的Python代码肯定有问题。

import os
from selenium import webdriver
from selenium.webdriver.chrome.service import Service


def init_driver(local_download_path):
    os.makedirs(local_download_path, exist_ok=True)

    # Set Chrome Options    
    chrome_options = Options()
    chrome_options.add_argument("--headless")
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--disable-dev-shm-usage")
    chrome_options.add_argument("--remote-debugging-port=9222")

    prefs = {
        "download.default_directory": local_download_path,
        "download.prompt_for_download": False,
        "download.directory_upgrade": True,
        "safebrowsing.enabled": True
    }
    chrome_options.add_experimental_option("prefs", prefs)

    # Set up the driver
    service = Service()

    chrome_options = Options()
    driver = webdriver.Chrome(service=service, options=chrome_options)

    # Set download behavior
    driver.execute_cdp_cmd("Page.setDownloadBehavior", {
        "behavior": "allow",
        "downloadPath": local_download_path
    })

    return driver

if __name__ == "__main__":
    # PARAMS
    ELECTION = '2024 MARCH 5TH DEMOCRATIC PRIMARY'
    ORIGIN_URL = "https://earlyvoting.texas-election.com/Elections/getElectionDetails.do"
    CSV_DL_DIR = "downloaded_files"

    # initialize the driver
    driver = init_driver(local_download_path=CSV_DL_DIR)

在此代码中,您重复了

chrome_options
行:

# Set Chrome Options    
    chrome_options = Options()
    chrome_options.add_argument("--headless")
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--disable-dev-shm-usage")
    chrome_options.add_argument("--remote-debugging-port=9222")

    prefs = {
        "download.default_directory": local_download_path,
        "download.prompt_for_download": False,
        "download.directory_upgrade": True,
        "safebrowsing.enabled": True
    }
    chrome_options.add_experimental_option("prefs", prefs)

    # Set up the driver
    service = Service()

    chrome_options = Options() # REPEAT HERE
    driver = webdriver.Chrome(service=service, options=chrome_options)

再次,我不确定这是否是问题所在,但删除它可能会清除您未来的麻烦。

© www.soinside.com 2019 - 2024. All rights reserved.