我在尝试在 AWS ECS Fargate 上使用 docker compose 部署 FastAPI + PostgreSQL 应用程序时遇到了一个持续存在的问题。尽管进行了多次调试尝试,但我似乎无法使其正常工作。退出代码是0,这意味着没有错误,但是容器启动后就停止了。
我在本地计算机上运行 Docker 化的 FastAPI 和 PostgreSQL 没有问题,但从 Huggingface 加载 ColPali 模型并开始使用 API 端点需要大量时间(约 8 分钟)。
当我将应用程序部署到 ECS 时,基本容器立即退出。 CloudWatch 中不会生成任何日志(尽管包含了 awslog、容器洞察等),并且调试一直是一场噩梦。我已确保包含必要的权限,包括
ecsTaskExecutionRole
、s3:GetObject
、ECR access
的策略。
我一直在关注这里的工作流程https://beabetterdev.com/2023/01/29/ecs-fargate-tutorial-with-fastapi/,通过它我能够部署一个带有几个端点的简单FastAPI并且没有其他依赖项成功。
我使用 aws cli 创建存储库、构建、标记和推送 Docker 映像。
我使用默认设置创建一个集群。
我使用 ecsTaskExecutionRole 的任务角色、Linux/ARM64(因为我使用的是 Mac M2 芯片)以及通过 S3 的环境变量创建了一个新的任务定义。
在“集群”下,我还在“任务”下运行一个新任务,主要使用默认设置。
此时我不知道该怎么办。任何帮助将非常感激!
任务定义json
{
"taskDefinitionArn": "arn:aws:ecs:us-west-2:<>:task-definition/colpali-search:1",
"containerDefinitions": [
{
"name": "colpali-search-api",
"image": "<>.dkr.ecr.us-west-2.amazonaws.com/colpali-search:latest",
"cpu": 0,
"portMappings": [
{
"name": "colpali-search-port",
"containerPort": 8000,
"hostPort": 8000,
"protocol": "tcp",
"appProtocol": "http"
}
],
"essential": true,
"environment": [],
"environmentFiles": [
{
"value": "arn:aws:s3:::colpali-search-bucket/.env",
"type": "s3"
}
],
"mountPoints": [],
"volumesFrom": [],
"ulimits": [],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/colpali-search",
"mode": "non-blocking",
"awslogs-create-group": "true",
"max-buffer-size": "25m",
"awslogs-region": "us-west-2",
"awslogs-stream-prefix": "ecs"
},
"secretOptions": []
},
"systemControls": []
}
],
"family": "colpali-search",
"executionRoleArn": "arn:aws:iam::<>:role/ecsTaskExecutionRole",
"networkMode": "awsvpc",
"revision": 1,
"volumes": [],
"status": "ACTIVE",
"requiresAttributes": [
{
"name": "com.amazonaws.ecs.capability.logging-driver.awslogs"
},
{
"name": "ecs.capability.execution-role-awslogs"
},
{
"name": "com.amazonaws.ecs.capability.ecr-auth"
},
{
"name": "com.amazonaws.ecs.capability.docker-remote-api.1.19"
},
{
"name": "com.amazonaws.ecs.capability.docker-remote-api.1.28"
},
{
"name": "ecs.capability.env-files.s3"
},
{
"name": "ecs.capability.execution-role-ecr-pull"
},
{
"name": "com.amazonaws.ecs.capability.docker-remote-api.1.18"
},
{
"name": "ecs.capability.task-eni"
},
{
"name": "com.amazonaws.ecs.capability.docker-remote-api.1.29"
}
],
"placementConstraints": [],
"compatibilities": [
"EC2",
"FARGATE"
],
"requiresCompatibilities": [
"FARGATE"
],
"cpu": "1024",
"memory": "3072",
"runtimePlatform": {
"cpuArchitecture": "ARM64",
"operatingSystemFamily": "LINUX"
},
"registeredAt": "2024-12-03T18:09:53.720Z",
"registeredBy": "",
"tags": []
}
docker-compose.yml
services:
fastapi:
build: .
container_name: colpali-search
depends_on:
db:
condition: service_healthy
networks:
- app-network
volumes:
- ./:/code:ro
env_file:
- .env
environment:
- DATABASE_URL=${DATABASE_URL}
- PYTHONPATH=/code
ports:
- '8000:8000'
command: >
sh -c "alembic upgrade head && fastapi run app.py --port 8000 --workers 4"
db:
image: pgvector/pgvector:pg17
restart: always
volumes:
- postgres_data:/var/lib/postgresql/data/
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
ports:
- 5432:5432
networks:
- app-network
env_file:
- .env
environment:
- POSTGRES_USER=${POSTGRES_USER}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
- POSTGRES_DB=${POSTGRES_DB}
entrypoint: sh -c "chmod 644 /docker-entrypoint-initdb.d/init.sql && docker-entrypoint.sh postgres"
healthcheck:
test:
[
'CMD-SHELL',
'pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}',
]
interval: 5s
timeout: 5s
retries: 5
networks:
app-network:
volumes:
postgres_data:
Dockerfile
FROM python:3.12-slim as base
RUN apt-get update && apt-get install -y poppler-utils && rm -rf /var/lib/apt/lists/*
ENV POETRY_VERSION=1.6.1 \
PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
PIP_NO_CACHE_DIR=off \
PIP_DISABLE_PIP_VERSION_CHECK=on \
PIP_DEFAULT_TIMEOUT=100 \
POETRY_HOME="/opt/poetry" \
POETRY_VIRTUALENVS_IN_PROJECT=true \
POETRY_NO_INTERACTION=1 \
PYSETUP_PATH="/opt/pysetup" \
VENV_PATH="/opt/pysetup/.venv"
ENV PATH="$POETRY_HOME/bin:$VENV_PATH/bin:$PATH"
FROM base as builder
RUN --mount=type=cache,target=/root/.cache \
pip install "poetry==$POETRY_VERSION"
WORKDIR $PYSETUP_PATH
COPY ./poetry.lock ./pyproject.toml ./
RUN --mount=type=cache,target=$POETRY_HOME/pypoetry/cache \
poetry install --no-dev
FROM base as production
ENV FASTAPI_ENV=production
COPY --from=builder $VENV_PATH $VENV_PATH
COPY ./colpali_search /colpali_search
COPY .env /colpali_search/
COPY alembic.ini /colpali_search/
COPY alembic /colpali_search/alembic
WORKDIR /colpali_search
EXPOSE 8000
您的
COMMAND
中没有定义 Dockerfile
。
您确实在
command
文件中定义了 docker-compose.yml
。
您没有在 ECS 任务定义中定义
command
。
您为 docker 映像定义运行命令的唯一位置是在
docker-compose.yml
文件中,该文件仅在本地运行应用程序时使用。根据您当前的配置,我认为 command
将从基础 docker 镜像 python:3.12-slim 继承,该镜像仅运行 python
且不带参数并退出。这很可能就是您在部署 ECS 任务后快速容器退出且没有日志的原因。