我在渲染中部署硒抓取脚本时遇到一些问题。在本地,脚本运行良好,当我在渲染上部署它并尝试访问端点以触发脚本时,这就是它显示的问题:
WebDriver异常 selenium.common.exceptions.WebDriverException:消息:服务 /root/.cache/selenium/chromedriver/linux64/125.0.6422.60/chromedriver 意外退出。状态代码是:127
脚本结构如下:
我将继续制作脚本的副本,如果有不清楚的地方,请告诉我:
wiki_script.py:
# BeautifulSoup imports
from bs4 import BeautifulSoup
# Selenium imports
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
# Flask imports
from flask import Flask, request
'''0. We create the flask app'''
app = Flask(__name__)
'''1. Main function'''
def main_script():
# Url to scrape
url = 'https://www.wikipedia.org/'
# Selenium parameters, headless for deploy
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.6045.160 Safari/537.36")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--window-size=1920,1080")
driver = webdriver.Chrome(options=chrome_options)
# Opens the url
driver.get(url)
# Parse the url with beautifulsoup
soup = BeautifulSoup(driver.page_source, features="html.parser")
# Find the class that has the english text
lang_elements = soup.find_all(class_='central-featured-lang lang2')
# Get the 'English' text and print it from inside 'strong' attribute
strong_texts = []
for element in lang_elements:
strong_tag = element.find('strong')
if strong_tag:
strong_texts.append(strong_tag.get_text())
print(strong_texts)
return strong_texts
'''2. Configs for the API and Flask'''
@app.route('/', methods = ['GET'])
def home():
if (request.method == 'GET'):
return main_script()
if __name__=='__main__':
app.run(debug=True, host='0.0.0.0')
需求.txt:
beautifulsoup4
selenium
Flask
webdriver-manager
packaging
gunicorn
Dockerfile:
FROM python:3.9-slim
WORKDIR /
COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
ENV PORT=5000
EXPOSE $PORT
CMD ["python", "wiki_script.py"]
我已经尝试对 Chrome 选项的设置方式进行了一些更改,但似乎没有任何效果真正正常,我有点迷失,任何帮助将不胜感激。
由于您的 docker 容器上未安装驱动程序,可能会导致该问题,这里有一个解决方案,可以下载兼容的 ChromeDriver 并使用
webdriver-manager
在您的渲染环境中运行它:
requirements.txt
:
beautifulsoup4
selenium
Flask
webdriver-manager
wiki_script.py
:
# webdriver_manager import
from webdriver_manager.chrome import ChromeDriverManager
# BeautifulSoup imports
from bs4 import BeautifulSoup
# Selenium imports
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
# Flask imports
from flask import Flask, request
'''0. We create the flask app'''
app = Flask(__name__)
'''1. Main function'''
def main_script():
# Url to scrape
url = 'https://www.wikipedia.org/'
# Selenium parameters, headless for deploy
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.6045.160 Safari/537.36")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--window-size=1920,1080")
# Download and configure ChromeDriver automatically
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options)
# Opens the url
driver.get(url)
# Parse the url with beautifulsoup
soup = BeautifulSoup(driver.page_source, features="html.parser")
# Find the class that has the english text
lang_elements = soup.find_all(class_='central-featured-lang lang2')
# Get the 'English' text and print it from inside 'strong' attribute
strong_texts = []
for element in lang_elements:
strong_tag = element.find('strong')
if strong_tag:
strong_texts.append(strong_tag.get_text())
print(strong_texts)
return strong_texts
'''2. Configs for the API and Flask'''
@app.route('/', methods = ['GET'])
def home():
if (request.method == 'GET'):
return main_script()
if __name__=='__main__':
app.run(debug=True, host='0.0.0.0')
说明:
webdriver-manager
用于根据您在Doker中的Chrome版本自动下载兼容的ChromeDriver版本。ChromeDriverManager().install()
检索适当的 chromedriver 路径。Service(ChromeDriverManager().install())
配置 WebDriver 以使用下载的 chromedriver。通过这些修改,您的脚本应该会自动下载并使用 Doker 中兼容的 ChromeDriver 版本。