在python中部署selenium/flask/docker脚本

Question

我在渲染中部署硒抓取脚本时遇到一些问题。在本地，脚本运行良好，当我在渲染上部署它并尝试访问端点以触发脚本时，这就是它显示的问题：

WebDriver异常 selenium.common.exceptions.WebDriverException：消息：服务 /root/.cache/selenium/chromedriver/linux64/125.0.6422.60/chromedriver 意外退出。状态代码是：127

脚本结构如下：

structure

我将继续制作脚本的副本，如果有不清楚的地方，请告诉我：

wiki_script.py：

# BeautifulSoup imports
from bs4 import BeautifulSoup

# Selenium imports
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service

# Flask imports
from flask import Flask, request


'''0. We create the flask app'''
app = Flask(__name__)


'''1. Main function'''
def main_script():

    # Url to scrape
    url = 'https://www.wikipedia.org/'

    # Selenium parameters, headless for deploy

    chrome_options = Options()
    chrome_options.add_argument("--headless")
    chrome_options.add_argument("user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.6045.160 Safari/537.36")
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--disable-dev-shm-usage")
    chrome_options.add_argument("--disable-gpu")
    chrome_options.add_argument("--window-size=1920,1080")
    
    driver = webdriver.Chrome(options=chrome_options)

    # Opens the url
    driver.get(url)

    # Parse the url with beautifulsoup
    soup = BeautifulSoup(driver.page_source, features="html.parser")

    # Find the class that has the english text
    lang_elements = soup.find_all(class_='central-featured-lang lang2')

    # Get the 'English' text and print it from inside 'strong' attribute
    strong_texts = []

    for element in lang_elements:
        strong_tag = element.find('strong')
        if strong_tag:
            strong_texts.append(strong_tag.get_text())

    print(strong_texts)
    return strong_texts
    

'''2. Configs for the API and Flask'''

@app.route('/', methods = ['GET'])

def home():
    if (request.method == 'GET'):

        return main_script()


if __name__=='__main__':
    app.run(debug=True, host='0.0.0.0')

需求.txt：

beautifulsoup4
selenium
Flask
webdriver-manager
packaging
gunicorn

Dockerfile：

FROM python:3.9-slim

WORKDIR /

COPY requirements.txt requirements.txt

RUN pip install --no-cache-dir -r requirements.txt

COPY . .

ENV PORT=5000
EXPOSE $PORT

CMD ["python", "wiki_script.py"]

我已经尝试对 Chrome 选项的设置方式进行了一些更改，但似乎没有任何效果真正正常，我有点迷失，任何帮助将不胜感激。

Answer 1

由于您的 docker 容器上未安装驱动程序，可能会导致该问题，这里有一个解决方案，可以下载兼容的 ChromeDriver 并使用

webdriver-manager

在您的渲染环境中运行它：

requirements.txt

:

beautifulsoup4
selenium
Flask
webdriver-manager

wiki_script.py

:

# webdriver_manager import
from webdriver_manager.chrome import ChromeDriverManager

# BeautifulSoup imports
from bs4 import BeautifulSoup

# Selenium imports
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service

# Flask imports
from flask import Flask, request


'''0. We create the flask app'''
app = Flask(__name__)


'''1. Main function'''
def main_script():
    # Url to scrape
    url = 'https://www.wikipedia.org/'

    # Selenium parameters, headless for deploy
    chrome_options = Options()
    chrome_options.add_argument("--headless")
    chrome_options.add_argument("user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.6045.160 Safari/537.36")
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--disable-dev-shm-usage")
    chrome_options.add_argument("--disable-gpu")
    chrome_options.add_argument("--window-size=1920,1080")

    # Download and configure ChromeDriver automatically
    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options)
   
    # Opens the url
    driver.get(url)

    # Parse the url with beautifulsoup
    soup = BeautifulSoup(driver.page_source, features="html.parser")

    # Find the class that has the english text
    lang_elements = soup.find_all(class_='central-featured-lang lang2')

    # Get the 'English' text and print it from inside 'strong' attribute
    strong_texts = []

    for element in lang_elements:
        strong_tag = element.find('strong')
        if strong_tag:
            strong_texts.append(strong_tag.get_text())

    print(strong_texts)
    return strong_texts

'''2. Configs for the API and Flask'''

@app.route('/', methods = ['GET'])

def home():
    if (request.method == 'GET'):

        return main_script()


if __name__=='__main__':
    app.run(debug=True, host='0.0.0.0')

说明：

删除了chromedriver路径的手动配置。
```
webdriver-manager
```
用于根据您在Doker中的Chrome版本自动下载兼容的ChromeDriver版本。
```
ChromeDriverManager().install()
```
检索适当的 chromedriver 路径。
```
Service(ChromeDriverManager().install())
```
配置 WebDriver 以使用下载的 chromedriver。

通过这些修改，您的脚本应该会自动下载并使用 Doker 中兼容的 ChromeDriver 版本。

在python中部署selenium/flask/docker脚本

问题描述投票：0回答：1

1个回答

最新问题

在python中部署selenium/flask/docker脚本

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1