使用自定义容器将基于 Flask 的机器学习模型部署到 Vertex AI 端点时遇到问题

问题描述 投票:0回答:1

我正在部署一个 Flask 应用程序,该应用程序使用 PyTorch 提供机器学习模型(打包为 Docker 容器)到 Vertex AI 端点以进行在线预测。尽管

Flask 应用程序在容器内成功启动(如我的日志所示),但我尝试将模型部署到 Vertex AI 端点始终失败。

以下是我的设置的详细信息:

型号名称:

EnsembleFlaskClassifierV2
型号 ID:
8528383931077099520
地区:
asia-southeast1
容器图像:
asia-southeast1-docker.pkg.dev/fyp-jx-416511/ensembleflask/ensembleflask-app:latest
用于部署的机器类型:
n1-standard-4

我的 Flask 应用程序 (ensemble_deploy_1.py) 成功初始化并加载 PyTorch 模型,并设计为在收到 POST 请求时使用集成方法进行预测。 Flask 应用程序设置为在端口 5001 上运行,并且这已在 Dockerfile 中正确公开和指定。

Dockerfile配置:

# Use an official PyTorch image as the base
FROM pytorch/pytorch:1.7.1-cuda11.0-cudnn8-runtime

# Set the working directory in the container
WORKDIR /usr/src/app

# Install dependencies
RUN pip install numpy Pillow flask torch torchvision

# Copy the Flask script, model files, and entrypoint script into the container
COPY ensemble_deploy_1.py .
COPY Models/DenseNet_Optimal.pt .
COPY Models/ResNext_Optimal.pt .
COPY Models/MobileNetV2_Optimal.pt .
COPY entrypoint.sh .

# Make the entrypoint script executable
RUN chmod +x entrypoint.sh

# Set environment variable to specify the Flask application
ENV FLASK_APP=ensemble_deploy_1.py

# EXPOSE command is commented out because it's not necessary for Vertex AI
# EXPOSE 5001

# Use entrypoint.sh to start the service
ENTRYPOINT ["./entrypoint.sh"]

入口点.sh:

#!/bin/sh
# This script sets the FLASK_APP environment variable and starts the Flask server
export FLASK_APP=ensemble_deploy_1.py
flask run --host=0.0.0.0 --port=5001

ensemble_deploy_1.py:

from flask import Flask, request, jsonify
import torch
from torchvision import models, transforms
from PIL import Image
import io
import torch.nn.functional as F
import logging
from logging.handlers import RotatingFileHandler

app = Flask(__name__)

# Define global device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Define the preprocess function
def preprocess_image(image_bytes):
    preprocess = transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.Grayscale(num_output_channels=3),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ])
    image = Image.open(io.BytesIO(image_bytes)).convert("RGB")
    image = preprocess(image).unsqueeze(0)
    return image.to(device)

# Model initialization functions
def initialize_model(model_name):
    if model_name == "densenet":
        model = models.densenet121(pretrained=False)
        num_ftrs = model.classifier.in_features
        model.classifier = torch.nn.Linear(num_ftrs, 7)
    elif model_name == "resnext":
        model = models.resnext50_32x4d(pretrained=False)
        num_ftrs = model.fc.in_features
        model.fc = torch.nn.Linear(num_ftrs, 7)
    elif model_name == "mobilenetv2":
        model = models.mobilenet_v2(pretrained=False)
        num_ftrs = model.classifier[1].in_features
        model.classifier = torch.nn.Sequential(
            torch.nn.Dropout(0.2),
            torch.nn.Linear(num_ftrs, 7)
        )
    else:
        raise ValueError("Invalid model name")
    return model.to(device).eval()

# Initialize and load models outside the request handler
model_paths = {
    'densenet': 'DenseNet_Optimal.pt',
    'resnext': 'ResNext_Optimal.pt',
    'mobilenetv2': 'MobileNetV2_Optimal.pt'
}

models = {name: initialize_model(name) for name in model_paths}
for name, model in models.items():
    model.load_state_dict(torch.load(model_paths[name], map_location=device))

# Define your F1 scores
all_model_f1_scores = {
    'Angry': {'DenseNet': 0.62, 'ResNext': 0.63, 'MobileNetV2': 0.60},
    'Disgust': {'DenseNet': 0.57, 'ResNext': 0.73, 'MobileNetV2': 0.63},
    'Fear': {'DenseNet': 0.51, 'ResNext': 0.54, 'MobileNetV2': 0.51},
    'Happy': {'DenseNet': 0.89, 'ResNext': 0.89, 'MobileNetV2': 0.88},
    'Neutral': {'DenseNet': 0.67, 'ResNext': 0.66, 'MobileNetV2': 0.66},
    'Sad': {'DenseNet': 0.57, 'ResNext': 0.58, 'MobileNetV2': 0.58},
    'Surprise': {'DenseNet': 0.81, 'ResNext': 0.81, 'MobileNetV2': 0.79},
}

class_names = ['Angry', 'Disgust', 'Fear', 'Happy', 'Neutral', 'Sad', 'Surprise']

def predict_with_ensemble(image_tensor, models, f1_scores, class_names):
    # Ensure the image tensor is on the correct device
    image_tensor = image_tensor.to(device)
    weighted_preds = torch.zeros(1, len(class_names), device=device)
    
    for model_name, model in models.items():
        with torch.no_grad():
            outputs = model(image_tensor)
            probs = F.softmax(outputs, dim=1)
            for i, class_name in enumerate(class_names):
                # Multiply by F1 score if available, else default to 1 (no weighting)
                f1_weight = f1_scores.get(class_name, {}).get(model_name, 1)
                weighted_preds[:, i] += probs[:, i] * f1_weight

    final_pred = torch.argmax(weighted_preds, dim=1)
    predicted_class = class_names[final_pred.item()]
    return predicted_class

# Setup logging
handler = RotatingFileHandler('app.log', maxBytes=10000, backupCount=3)
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
logger.addHandler(handler)

@app.route('/predict', methods=['POST'])
def predict():
    logger.info("Received prediction request")
    if request.method == 'POST':
        # Convert string of image data to uint8
        if 'file' not in request.files:
            return jsonify({'error': 'No file part'})
        file = request.files['file']
        if file.filename == '':
            return jsonify({'error': 'No selected file'})
        if file:
            image_bytes = file.read()
            image_tensor = preprocess_image(image_bytes)
            predicted_class = predict_with_ensemble(image_tensor, models, all_model_f1_scores, class_names)
            return jsonify({'predicted_class': predicted_class})

if __name__ == '__main__':
    app.run(debug=True, host='0.0.0.0', port=5001)

日志:

信息 * 服务 Flask 应用程序“ensemble_deploy_1.py”信息 * 调试模式: 关闭 错误 * CUDA 初始化:在您的计算机上找不到 NVIDIA 驱动程序 系统。警告:这是一个开发服务器。请勿将其用于 生产部署。

我尝试在其他区域而不是 asia-southeast1 上部署,尝试使用其他机器类型进行部署。我还尝试在本地测试容器并且它有效。

我已验证 FLASK_APP 环境变量已设置,并且本地运行容器按预期工作。但是,部署到 Vertex AI 失败。

不确定是不是因为将基于 Flask 的容器部署到 Vertex AI 端点时需要额外的配置?我现在有点迷失了。

docker google-cloud-platform pytorch flask-restful google-cloud-vertex-ai
1个回答
0
投票

您的问题解决了吗?我坚持同样的事情

© www.soinside.com 2019 - 2024. All rights reserved.