在python中部署selenium/flask/docker脚本

问题描述 投票:0回答:1

我在渲染中部署硒抓取脚本时遇到一些问题。在本地,脚本运行良好,当我在渲染上部署它并尝试访问端点以触发脚本时,这就是它显示的问题:

WebDriver异常 selenium.common.exceptions.WebDriverException:消息:服务 /root/.cache/selenium/chromedriver/linux64/125.0.6422.60/chromedriver 意外退出。状态代码是:127

脚本结构如下:

structure

我将继续制作脚本的副本,如果有不清楚的地方,请告诉我:

wiki_script.py:

# BeautifulSoup imports
from bs4 import BeautifulSoup

# Selenium imports
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service

# Flask imports
from flask import Flask, request


'''0. We create the flask app'''
app = Flask(__name__)


'''1. Main function'''
def main_script():

    # Url to scrape
    url = 'https://www.wikipedia.org/'

    # Selenium parameters, headless for deploy

    chrome_options = Options()
    chrome_options.add_argument("--headless")
    chrome_options.add_argument("user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.6045.160 Safari/537.36")
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--disable-dev-shm-usage")
    chrome_options.add_argument("--disable-gpu")
    chrome_options.add_argument("--window-size=1920,1080")
    
    driver = webdriver.Chrome(options=chrome_options)

    # Opens the url
    driver.get(url)

    # Parse the url with beautifulsoup
    soup = BeautifulSoup(driver.page_source, features="html.parser")

    # Find the class that has the english text
    lang_elements = soup.find_all(class_='central-featured-lang lang2')

    # Get the 'English' text and print it from inside 'strong' attribute
    strong_texts = []

    for element in lang_elements:
        strong_tag = element.find('strong')
        if strong_tag:
            strong_texts.append(strong_tag.get_text())

    print(strong_texts)
    return strong_texts
    

'''2. Configs for the API and Flask'''

@app.route('/', methods = ['GET'])

def home():
    if (request.method == 'GET'):

        return main_script()


if __name__=='__main__':
    app.run(debug=True, host='0.0.0.0')

需求.txt:

beautifulsoup4
selenium
Flask
webdriver-manager
packaging
gunicorn

Dockerfile:

FROM python:3.9-slim

WORKDIR /

COPY requirements.txt requirements.txt

RUN pip install --no-cache-dir -r requirements.txt

COPY . .

ENV PORT=5000
EXPOSE $PORT

CMD ["python", "wiki_script.py"]

我已经尝试对 Chrome 选项的设置方式进行了一些更改,但似乎没有任何效果真正正常,我有点迷失,任何帮助将不胜感激。

python docker selenium-webdriver flask
1个回答
0
投票

由于您的 docker 容器上未安装驱动程序,可能会导致该问题,这里有一个解决方案,可以下载兼容的 ChromeDriver 并使用

webdriver-manager
在您的渲染环境中运行它:

requirements.txt
:

beautifulsoup4
selenium
Flask
webdriver-manager

wiki_script.py
:

# webdriver_manager import
from webdriver_manager.chrome import ChromeDriverManager

# BeautifulSoup imports
from bs4 import BeautifulSoup

# Selenium imports
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service

# Flask imports
from flask import Flask, request


'''0. We create the flask app'''
app = Flask(__name__)


'''1. Main function'''
def main_script():
    # Url to scrape
    url = 'https://www.wikipedia.org/'

    # Selenium parameters, headless for deploy
    chrome_options = Options()
    chrome_options.add_argument("--headless")
    chrome_options.add_argument("user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.6045.160 Safari/537.36")
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--disable-dev-shm-usage")
    chrome_options.add_argument("--disable-gpu")
    chrome_options.add_argument("--window-size=1920,1080")

    # Download and configure ChromeDriver automatically
    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options)
   
    # Opens the url
    driver.get(url)

    # Parse the url with beautifulsoup
    soup = BeautifulSoup(driver.page_source, features="html.parser")

    # Find the class that has the english text
    lang_elements = soup.find_all(class_='central-featured-lang lang2')

    # Get the 'English' text and print it from inside 'strong' attribute
    strong_texts = []

    for element in lang_elements:
        strong_tag = element.find('strong')
        if strong_tag:
            strong_texts.append(strong_tag.get_text())

    print(strong_texts)
    return strong_texts

'''2. Configs for the API and Flask'''

@app.route('/', methods = ['GET'])

def home():
    if (request.method == 'GET'):

        return main_script()


if __name__=='__main__':
    app.run(debug=True, host='0.0.0.0')

说明:

  • 删除了chromedriver路径的手动配置。
  • webdriver-manager
    用于根据您在Doker中的Chrome版本自动下载兼容的ChromeDriver版本。
  • ChromeDriverManager().install()
    检索适当的 chromedriver 路径。
  • Service(ChromeDriverManager().install())
    配置 WebDriver 以使用下载的 chromedriver。

通过这些修改,您的脚本应该会自动下载并使用 Doker 中兼容的 ChromeDriver 版本。

© www.soinside.com 2019 - 2024. All rights reserved.