由 detectorron2 引起的服务模型构建过程中的错误

问题描述 投票:0回答:1

简介:我正在尝试在 Databricks 上注册我的模型,以便我可以将其作为端点。我需要的软件包是“torch”、“mlflow”、“torchvision”、“numpy”和“git+https://github.com/facebookresearch/detectron2.git”。为此,我在 Databricks 上创建了一个笔记本,部分代码如下:

class DetectronModel(mlflow.pyfunc.PythonModel):

    def load_context(self, context):
        self.predictor = joblib.load(context.artifacts["predictor"])

    def predict(self, context, model_input):
        image = np.array(model_input["image"])
        result = self.predictor(image)
        instances = result['instances']
        return instances
    

predictor_path = f"/dbfs/mnt/{container_name}/segmentationWeightsPath/predictor.pkl"

conda_env = {
    "channels": ["defaults"],
    "dependencies": [
        "python=3.8",
        "pip",
        {
            "pip": [
                "torch",
                "mlflow",
                "torchvision",
                "numpy",
                "git+https://github.com/facebookresearch/detectron2.git"
            ]
        }
    ]
}

model_info = mlflow.pyfunc.log_model(
    artifact_path="detectron_model_artifact",
    python_model=DetectronModel(),
    artifacts={"predictor": predictor_path},
    conda_env=conda_env
)

model_name = "detectron_model"
model_version = mlflow.register_model(
    model_uri=model_info.model_uri,
    name=model_name
)

问题:注册模型并尝试提供模型服务后,构建过程会因安装 detectorron2 时出现

ModuleNotFoundError: No module named 'torch'
而失败。虽然在 conda_env 中,明显添加了 torch,所以我很困惑为什么会收到错误。

我已附上日志以供参考。

#20 0.390 channels:
#20 0.390 - defaults
#20 0.390 dependencies:
#20 0.390 - python=3.8
#20 0.390 - pip
#20 0.390 - pip:
#20 0.390   - torch
#20 0.390   - mlflow
#20 0.390   - torchvision
#20 0.390   - numpy
#20 0.390   - git+https://github.com/facebookresearch/detectron2.git
#20 0.647 Collecting package metadata (repodata.json): ...working... done
#20 5.820 Solving environment: ...working... done
#20 6.192 
#20 6.192 
#20 6.192 ==> WARNING: A newer version of conda exists. <==
#20 6.192   current version: 4.10.3
#20 6.192   latest version: 24.5.0
#20 6.192 
#20 6.192 Please update conda by running
#20 6.192 
#20 6.192     $ conda update -n base -c defaults conda
#20 6.192 
#20 6.192 
#20 6.202 
#20 6.202 Downloading and Extracting Packages
#20 6.202 
ncurses-6.4          | 914 KB    |            |   0% 
ncurses-6.4          | 914 KB    | ########## | 100% 
ncurses-6.4          | 914 KB    | ########## | 100% 
#20 6.395 
zlib-1.2.13          | 111 KB    |            |   0% 
zlib-1.2.13          | 111 KB    | ########## | 100% 
#20 6.423 
libffi-3.4.4         | 141 KB    |            |   0% 
libffi-3.4.4         | 141 KB    | ########## | 100% 
#20 6.466 
ca-certificates-2024 | 127 KB    |            |   0% 
ca-certificates-2024 | 127 KB    | ########## | 100% 
#20 6.490 
readline-8.2         | 357 KB    |            |   0% 
readline-8.2         | 357 KB    | ########## | 100% 
#20 6.521 
wheel-0.43.0         | 109 KB    |            |   0% 
wheel-0.43.0         | 109 KB    | ########## | 100% 
#20 6.548 
_openmp_mutex-5.1    | 21 KB     |            |   0% 
_openmp_mutex-5.1    | 21 KB     | ########## | 100% 
#20 6.574 
libgomp-11.2.0       | 474 KB    |            |   0% 
libgomp-11.2.0       | 474 KB    | ########## | 100% 
#20 6.608 
ld_impl_linux-64-2.3 | 654 KB    |            |   0% 
ld_impl_linux-64-2.3 | 654 KB    | ########## | 100% 
#20 6.635 
setuptools-69.5.1    | 1002 KB   |            |   0% 
setuptools-69.5.1    | 1002 KB   | ########## | 100% 
#20 6.699 
python-3.8.19        | 23.8 MB   |            |   0% 
python-3.8.19        | 23.8 MB   | ########## | 100% 
python-3.8.19        | 23.8 MB   | ########## | 100% 
#20 7.085 
tk-8.6.14            | 3.4 MB    |            |   0% 
tk-8.6.14            | 3.4 MB    | ########## | 100% 
#20 7.172 
openssl-3.0.13       | 5.2 MB    |            |   0% 
openssl-3.0.13       | 5.2 MB    | ########## | 100% 
#20 7.267 
sqlite-3.45.3        | 1.2 MB    |            |   0% 
sqlite-3.45.3        | 1.2 MB    | ########## | 100% 
#20 7.304 
libgcc-ng-11.2.0     | 5.3 MB    |            |   0% 
libgcc-ng-11.2.0     | 5.3 MB    | ########## | 100% 
libgcc-ng-11.2.0     | 5.3 MB    | ########## | 100% 
#20 7.424 
xz-5.4.6             | 643 KB    |            |   0% 
xz-5.4.6             | 643 KB    | ########## | 100% 
#20 7.463 
libstdcxx-ng-11.2.0  | 4.7 MB    |            |   0% 
libstdcxx-ng-11.2.0  | 4.7 MB    | ########## | 100% 
#20 7.553 
pip-24.0             | 2.6 MB    |            |   0% 
pip-24.0             | 2.6 MB    | ########## | 100% 
pip-24.0             | 2.6 MB    | ########## | 100% 
#20 7.675 Preparing transaction: ...working... done
#20 7.841 Verifying transaction: ...working... done
#20 8.559 Executing transaction: ...working... done
#20 8.934 Installing pip dependencies: ...working... Pip subprocess error:
#20 10.81   Running command git clone --filter=blob:none --quiet https://github.com/facebookresearch/detectron2.git /tmp/pip-req-build-uc1k_xq1
#20 10.81   error: subprocess-exited-with-error
#20 10.81   
#20 10.81   × python setup.py egg_info did not run successfully.
#20 10.81   │ exit code: 1
#20 10.81   ╰─> [6 lines of output]
#20 10.81       Traceback (most recent call last):
#20 10.81         File "<string>", line 2, in <module>
#20 10.81         File "<pip-setuptools-caller>", line 34, in <module>
#20 10.81         File "/tmp/pip-req-build-uc1k_xq1/setup.py", line 10, in <module>
#20 10.81           import torch
#20 10.81       ModuleNotFoundError: No module named 'torch'
#20 10.81       [end of output]
#20 10.81   
#20 10.81   note: This error originates from a subprocess, and is likely not a problem with pip.
#20 10.81 error: metadata-generation-failed
#20 10.81 
#20 10.81 × Encountered error while generating package metadata.
#20 10.81 ╰─> See above for output.
#20 10.81 
#20 10.81 note: This is an issue with the package mentioned above, not pip.
#20 10.81 hint: See above for details.
#20 10.81 
#20 10.81 Ran pip subprocess with arguments:
#20 10.81 ['/opt/conda/envs/mlflow-env/bin/python', '-m', 'pip', 'install', '-U', '-r', '/model/condaenv.detud3st.requirements.txt']
#20 10.81 Pip subprocess output:
#20 10.81 Collecting git+https://github.com/facebookresearch/detectron2.git (from -r /model/condaenv.detud3st.requirements.txt (line 5))
#20 10.81   Cloning https://github.com/facebookresearch/detectron2.git to /tmp/pip-req-build-uc1k_xq1
#20 10.81   Resolved https://github.com/facebookresearch/detectron2.git to commit 79f914785a87b80565381f4489b129e633c4efb5
#20 10.81   Preparing metadata (setup.py): started
#20 10.81   Preparing metadata (setup.py): finished with status 'error'
#20 10.81 
#20 10.81 failed
#20 10.81 
#20 10.81 CondaEnvException: Pip failed
#20 10.81 
#20 ERROR: process "/bin/sh -c echo $BUILD_LOG_START_DELIMITER && cat model/conda.yaml && conda env create -f model/conda.yaml -n mlflow-env && echo $BUILD_LOG_CONDA_END_DELIMITER && echo $BUILD_LOG_END_DELIMITER && conda clean -afy" did not complete successfully: exit code: 1
------

看起来探测器2在火炬之前被触发。

我想为我的问题获得一些支持,并很乐意分享更多信息。

我创建了一个轮子,但构建过程没有触发轮子。我尝试降级库版本,但这并没有解决问题(尽管这不是真正的解决方案,因为问题可能出在其他地方)。

databricks torch
1个回答
0
投票

找到答案了!

基本上,pip 以某种方式首先从 git 存储库安装了依赖项,并且没有遵循给定的顺序,因此为了解决这个问题,我添加了要安装的 conda 库。

conda_env = {
    "channels": [
        "defaults",
        "pytorch"
    ],
    "dependencies": [
        "python=3.8",
        "numpy==1.24.3",
        "pytorch==2.2.2",
        "pip",
        {
            "pip": [
                "fvcore==0.1.5.post20221221",
                "git+https://github.com/wookayin/gpustat",
                "pycocotools==2.0.6",
                "torchvision==0.15.2",
                "git+https://github.com/facebookresearch/detectron2.git"
            ]
        }
    ]
}
© www.soinside.com 2019 - 2024. All rights reserved.