Route 在本地 Docker 容器中工作正常，但在部署到 CloudRun 时抛出堆栈跟踪错误

Question

我正在用 Python 开发一个应用程序，使用 Selenium WebDriver（与 Chrome / ChromeDriver）进行网页抓取。

我的路线在本地 Docker 容器中运行良好（200，成功返回预期数据）。

但是当我部署（到 Google CloudRun，使用我在本地运行的相同 Docker 映像）并运行相同的路线时，我收到此错误：

"Message: 
Stacktrace:
#0 0x2a53f7fc886a <unknown>
#1 0x2a53f7c96e50 <unknown>
#2 0x2a53f7ce6644 <unknown>
#3 0x2a53f7ce6931 <unknown>
#4 0x2a53f7d2c534 <unknown>
#5 0x2a53f7d0b4bd <unknown>
#6 0x2a53f7d299c6 <unknown>..."

这是我的 Dockerfile：

# Use the Python image from Docker Hub
FROM python:3.12.4-slim

# Set the working directory
WORKDIR /app

# Install system dependencies for Chrome
RUN apt-get update && \
    apt-get install -y \
    wget \
    unzip \
    gcc \
    g++ \
    make \
    libnss3 \
    libgdk-pixbuf2.0-0 \
    libatk-bridge2.0-0 \
    libatk1.0-0 \
    libcups2 \
    libxkbcommon0 \
    libx11-xcb1 \
    libxcomposite1 \
    libxdamage1 \
    libxrandr2 \
    libxshmfence1 \
    libglib2.0-0 \
    libpango-1.0-0 \
    libpangocairo-1.0-0 \
    libfontconfig1 \
    libnss3 \
    fonts-liberation \
    libasound2 \
    libdrm2 \
    libgbm1 \
    libgtk-3-0 \
    libvulkan1 \
    libxfixes3 \
    xdg-utils \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

# Install Google Chrome
RUN wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
RUN dpkg -i google-chrome-stable_current_amd64.deb; apt-get -fy install

# Copy dependencies into the container and install
COPY requirements.txt .
RUN pip install -r requirements.txt

# Copy the rest of the application code into the container
COPY . .

# Expose the port
EXPOSE 8000

# Run the app (with Uvicorn)
CMD ["uvicorn", "app.router:app", "--host", "0.0.0.0", "--port", "8000", "--reload"]

我的Chrome版本是128.0.6613.119

在

requirements.txt

我有

chromedriver-binary>=128.0.0,<129.0.0

这是我的路线调用的函数：

from selenium import webdriver
import chromedriver_binary

@staticmethod
    async def search_info(company_name: str):
        options = webdriver.ChromeOptions()
        options.add_argument('--headless')
        options.add_argument('--no-sandbox')
        options.add_argument('--disable-dev-shm-usage')
        options.add_argument('--remote-debugging-port=9222') 

        driver = webdriver.Chrome(options=options)
        final_result = []

        try:
            keywords = ["Talent Acquisition", "Recruiter", "HR", "CEO"]
            for keyword in keywords:
                result = searchInfoService._search_info(
                    driver, company_name, keyword
                )
                if result:
                    final_result.extend(result[:1])
            return final_result
        except Exception as e:
            print(f"An error occurred: {e}")
            raise
        finally:
            driver.quit()

到目前为止，经过几个小时的研究，我已经尝试过：

步骤	本地（Docker）结果	部署结果
使用 ChromeDriverManager 安装兼容的 chromedriver: `driver = webdriver.Chrome(service=ChromeDriverManager().install() options=options)`	❌ RUN 文件出错	-
更新 `webdriver_manager` 至4.0.2	✅ 有效	❌ 503：服务不可用
增加部署容器中的内存	-	❌ 500：堆栈跟踪错误
重新配置 Dockerfile 以安装兼容版本的 Chrome 和 ChromeDriver	✅ 有效	❌ 500：堆栈跟踪错误
当前版本：简化了 Chrome 安装，添加了 `chromedriver-binary` 作为依赖项（如 here 和 here），将 Selenium 更新到 4.24.0，从驱动程序中删除了 `service` arg（根据 Selenium docs，它不是需要更长的时间）： `driver = webdriver.Chrome(options=options)`	✅ 有效	❌ 500：堆栈跟踪错误

我完全被难住了。如果有人有洞察力，我将不胜感激！我是初学者，所以请原谅代码中的任何混乱。

Answer 1

在这里大胆猜测。

考虑到它是一个异步方法，如果工作是在没有当前正在进行的 http 请求的情况下完成的，那么 cloud run 将杀死容器。您有几个选择。

在您的服务上配置最少的实例，以便始终至少有一个服务始终运行。这意味着异步的东西可以运行，对开发很有用。

将部署模式更改为

CPU is always allocated

而不是

CPU is only allocated during request processing

这里有关于它如何工作的很好的解释。 https://cloud.google.com/run/docs/configuring/cpu-allocation

这又是根据我从你的帖子中看到的信息进行的疯狂猜测。我以前没有见过这个错误，而且我已经使用 Cloud Run 很长时间了。

旁注：您还提到您提高了内存，但我是否可以建议进一步提高内存（就像只是为了排除它）到最大程度。无头浏览器抓取数据（这似乎是在做的事情）会消耗大量内存，尤其是 Chrome。

Route 在本地 Docker 容器中工作正常，但在部署到 CloudRun 时抛出堆栈跟踪错误

问题描述投票：0回答：1

1个回答

最新问题

Route 在本地 Docker 容器中工作正常，但在部署到 CloudRun 时抛出堆栈跟踪错误

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1