我正在尝试创建一个容器化的 Express 应用程序,该应用程序启动 selenium webdriver 实例并进行一些抓取。
以下应用程序在本地完美运行,但是当我尝试将其容器化时,Express 服务器按预期启动,但在向
/
发出请求后,它挂起约 30 秒然后崩溃。
有人能指出我正确的方向吗?
我的配置:
app.js
import 'chromedriver';
import express from 'express';
import webdriver from 'selenium-webdriver';
const app = express()
const port = 8080
app.get('/', async (req, res) => {
console.log('Starting the browser')
const chromeCapabilities = webdriver.Capabilities.chrome();
//setting chrome options to start the browser fully maximized
const chromeOptions = {
'args': ['--test-type', '--start-maximized', '--headless=new', '--disable-gpu']
};
chromeCapabilities.set('chromeOptions', chromeOptions);
const driver = new webdriver.Builder()
.forBrowser('chrome')
.withCapabilities(chromeCapabilities)
.build();
// console.log('Driver started, going to the website')
await driver.get('https://google.com');
driver.quit();
res.send('Hello World!')
})
// healthcheck
app.get('/health', (req, res) => {
res.send('OK')
});
app.listen(port, () => {
console.log(`Example app listening on port ${port}`)
})
docker文件
FROM node:18
# Install Chrome
RUN wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
RUN dpkg -i google-chrome-stable_current_amd64.deb; apt-get -fy install
WORKDIR /usr/src/app
COPY package*.json ./
# Install production dependencies.
RUN npm install --only=production
RUN npm install chromedriver --chromedriver-force-download
# Copy local code to the container image.
COPY . ./
# Run the web service on container startup.
CMD [ "npm", "start" ]
package.json
"dependencies": {
"express": "^4.18.2",
"selenium-webdriver": "^4.18.1",
"supabase": "^1.145.4"
}
日志
docker run --rm -p 8080:8080 -e PORT=8080 hw
> [email protected] start
> node app.js
Example app listening on port 8080
Starting the browser
/usr/src/app/node_modules/selenium-webdriver/remote/index.js:256
let cancelToken = earlyTermination.catch((e) => reject(Error(e.message)))
^
Error: Server terminated early with status 255
at /usr/src/app/node_modules/selenium-webdriver/remote/index.js:256:70
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
Node.js v18.19.1
make: *** [build] Error 1
(.venv) -------------------------------------------------------------------------------------------------------------------------------------------------
关键是要在映像上安装兼容版本的 Chrome 和 ChromeDriver。
🗎
Dockerfile
FROM node:18
RUN apt-get update -qq -y && \
apt-get install -y \
libasound2 \
libatk-bridge2.0-0 \
libgtk-4-1 \
libnss3 \
xdg-utils \
wget && \
wget -q -O chrome-linux64.zip https://edgedl.me.gvt1.com/edgedl/chrome/chrome-for-testing/121.0.6167.85/linux64/chrome-linux64.zip && \
unzip chrome-linux64.zip && \
rm chrome-linux64.zip && \
mv chrome-linux64 /opt/chrome/ && \
ln -s /opt/chrome/chrome /usr/local/bin/ && \
wget -q -O chromedriver-linux64.zip https://edgedl.me.gvt1.com/edgedl/chrome/chrome-for-testing/121.0.6167.85/linux64/chromedriver-linux64.zip && \
unzip -j chromedriver-linux64.zip chromedriver-linux64/chromedriver && \
rm chromedriver-linux64.zip && \
mv chromedriver /usr/local/bin/
WORKDIR /usr/src/app
COPY package*.json .
ENV CHROMEDRIVER_SKIP_DOWNLOAD=true
RUN npm install --omit=dev
RUN npm install chromedriver
COPY . .
CMD [ "npm", "start" ]
🗎
app.js
(我更新了 /
端点,以便它返回通过 Chrome 下载的内容,而不是字符串 'Hello World!'
。这似乎是一个更好的演示,表明一切正常!)。
import { Builder } from 'selenium-webdriver';
import chrome from 'selenium-webdriver/chrome.js';
import express from "express";
const app = express()
const port = 8080
app.get("/", async (req, res) => {
console.log("Start the browser.")
let chromeOptions = new chrome.Options();
chromeOptions.addArguments('--headless', '--disable-gpu', '--no-sandbox');
let driver = new Builder()
.forBrowser('chrome')
.setChromeOptions(chromeOptions)
.build();
console.log("Done!")
console.log("Open Google.")
await driver.get("https://google.com");
console.log("Done!")
const html = await driver.getPageSource();
driver.quit();
res.send(html)
})
app.get("/health", (req, res) => {
res.send("OK")
});
app.listen(port, () => {
console.log(`Example app listening on port ${port}.`)
})
🗎
package.json
{
"name": "selenium-scraper",
"type": "module",
"version": "1.0.0",
"description": "",
"main": "index.js",
"scripts": {
"start": "node app.js"
},
"author": "",
"license": "ISC",
"dependencies": {
"express": "^4.18.2",
"selenium-webdriver": "^4.18.1",
"supabase": "^1.145.4"
}
}
构建并运行。
docker build -t express-chrome . && docker run -it -p 8080:8080 express-chrome