Databricks 中 Metaflow 和 MLflow 的组合

问题描述 投票:0回答:0

我需要使用 Databricks-Notebooks 编写结合了 Metaflow 和 Mlflow 的脚本。

这是脚本:

import mlflow
from metaflow import FlowSpec, step, Parameter
import pandas as pd
import numpy as np
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris


class TrainFlow(FlowSpec):

    @step
    def start(self):
        iris = load_iris()
        iris_df = pd.DataFrame(data= np.c_[iris['data'], iris['target']], columns= iris['feature_names'] + ['target'])

        X_train, X_test, y_train, y_test = train_test_split(iris_df[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']], iris_df['target'])

        # Create a model
        model = Ridge(alpha=0.1)

        # Train the model on the training data
        model.fit(X_train, y_train)

        # Make predictions on the testing data
        y_pred = model.predict(X_test)

        # Evaluate the model on the testing data
        accuracy = model.score(X_test, y_test)

        self.next(self.end)

    @step
    def end(self):
        print('End of flow')

if __name__ == "__main__":
    TrainFlow()

我在 Databricks-Notebook 单元格中使用此命令执行此脚本:

%env USERNAME='xyz'
!python /dbfs/FileStore/xxx/metaflow_mlflow_workflow.py --no-pylint run

这个脚本运行良好。

现在,我将 MLflow 添加到脚本中:

import mlflow
from metaflow import FlowSpec, step, Parameter
import pandas as pd
import numpy as np
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris


class TrainFlow(FlowSpec):

    @step
    def start(self):
        iris = load_iris()
        iris_df = pd.DataFrame(data= np.c_[iris['data'], iris['target']], columns= iris['feature_names'] + ['target'])

        X_train, X_test, y_train, y_test = train_test_split(iris_df[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']], iris_df['target'])

        # Create a model
        model = Ridge(alpha=0.1)

        # Train the model on the training data
        model.fit(X_train, y_train)

        # Make predictions on the testing data
        y_pred = model.predict(X_test)

        # Evaluate the model on the testing data
        accuracy = model.score(X_test, y_test)
        
        # Set the experiment name
        experiment_name = "Iris Classification"

        # Log the metrics and model using MLflow
        with mlflow.start_run(run_name = experiment_name):
        
            mlflow.log_metric("accuracy_mean", 0.1)
            mlflow.log_metric("accuracy_std", 0.2)

            # Log the model's hyperparameters
            mlflow.log_param("random_state", 0.3)
            mlflow.log_param("n_estimators", 0.4)
            mlflow.log_param("eval_metric", 0.5)
            mlflow.log_param("k_fold", 0.6)

        self.next(self.end)

    @step
    def end(self):
        print('End of flow')

if __name__ == "__main__":
    TrainFlow()

和以前一样,我在 Databricks-Notebook 单元格中使用此命令执行此脚本:

%env USERNAME='xyz'
!python /dbfs/FileStore/xxx/metaflow_mlflow_workflow.py --no-pylint run

不幸的是,脚本崩溃了,我得到了这个错误:

env: USERNAME='xyz'
Metaflow 2.8.0 executing TrainFlow for user:'xyz'
Validating your flow...
    The graph looks good!
2023-04-06 07:50:51.288 Workflow starting (run-id 1680767451283182):
2023-04-06 07:50:51.302 [1680767451283182/start/1 (pid 2012)] Task is starting.
2023-04-06 07:50:53.940 [1680767451283182/start/1 (pid 2012)] <flow TrainFlow step start> failed:
2023-04-06 07:50:53.945 [1680767451283182/start/1 (pid 2012)] Internal error
2023-04-06 07:50:53.946 [1680767451283182/start/1 (pid 2012)] Traceback (most recent call last):
2023-04-06 07:50:53.946 [1680767451283182/start/1 (pid 2012)] File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/metaflow/cli.py", line 1172, in main
2023-04-06 07:50:53.946 [1680767451283182/start/1 (pid 2012)] start(auto_envvar_prefix="METAFLOW", obj=state)
2023-04-06 07:50:53.946 [1680767451283182/start/1 (pid 2012)] File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/metaflow/_vendor/click/core.py", line 829, in __call__
2023-04-06 07:50:53.946 [1680767451283182/start/1 (pid 2012)] return self.main(args, kwargs)
2023-04-06 07:50:54.223 [1680767451283182/start/1 (pid 2012)] File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/metaflow/_vendor/click/core.py", line 782, in main
2023-04-06 07:50:54.224 [1680767451283182/start/1 (pid 2012)] rv = self.invoke(ctx)
2023-04-06 07:50:54.224 [1680767451283182/start/1 (pid 2012)] File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/metaflow/_vendor/click/core.py", line 1259, in invoke
2023-04-06 07:50:54.224 [1680767451283182/start/1 (pid 2012)] return _process_result(sub_ctx.command.invoke(sub_ctx))
2023-04-06 07:50:54.224 [1680767451283182/start/1 (pid 2012)] File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/metaflow/_vendor/click/core.py", line 1066, in invoke
2023-04-06 07:50:54.224 [1680767451283182/start/1 (pid 2012)] return ctx.invoke(self.callback, ctx.params)
2023-04-06 07:50:54.224 [1680767451283182/start/1 (pid 2012)] File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/metaflow/_vendor/click/core.py", line 610, in invoke
2023-04-06 07:50:54.224 [1680767451283182/start/1 (pid 2012)] return callback(args, kwargs)
2023-04-06 07:50:54.224 [1680767451283182/start/1 (pid 2012)] File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/metaflow/_vendor/click/decorators.py", line 21, in new_func
2023-04-06 07:50:54.224 [1680767451283182/start/1 (pid 2012)] return f(get_current_context(), args, kwargs)
2023-04-06 07:50:54.224 [1680767451283182/start/1 (pid 2012)] File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/metaflow/cli.py", line 581, in step
2023-04-06 07:50:54.224 [1680767451283182/start/1 (pid 2012)] task.run_step(
2023-04-06 07:50:54.224 [1680767451283182/start/1 (pid 2012)] File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/metaflow/task.py", line 586, in run_step
2023-04-06 07:50:54.225 [1680767451283182/start/1 (pid 2012)] self._exec_step_function(step_func)
2023-04-06 07:50:54.225 [1680767451283182/start/1 (pid 2012)] File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/metaflow/task.py", line 60, in _exec_step_function
2023-04-06 07:50:54.225 [1680767451283182/start/1 (pid 2012)] step_function()
2023-04-06 07:50:54.225 [1680767451283182/start/1 (pid 2012)] File "/dbfs/FileStore/xxx/metaflow_mlflow_workflow.py", line 35, in start
2023-04-06 07:50:54.225 [1680767451283182/start/1 (pid 2012)] with mlflow.start_run(run_name = experiment_name):
2023-04-06 07:50:54.225 [1680767451283182/start/1 (pid 2012)] File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/mlflow/tracking/fluent.py", line 350, in start_run
2023-04-06 07:50:54.225 [1680767451283182/start/1 (pid 2012)] active_run_obj = client.create_run(
2023-04-06 07:50:54.225 [1680767451283182/start/1 (pid 2012)] File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/mlflow/tracking/client.py", line 275, in create_run
2023-04-06 07:50:54.225 [1680767451283182/start/1 (pid 2012)] return self._tracking_client.create_run(experiment_id, start_time, tags, run_name)
2023-04-06 07:50:54.225 [1680767451283182/start/1 (pid 2012)] File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/mlflow/tracking/_tracking_service/client.py", line 131, in create_run
2023-04-06 07:50:54.225 [1680767451283182/start/1 (pid 2012)] return self.store.create_run(
2023-04-06 07:50:54.225 [1680767451283182/start/1 (pid 2012)] File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/mlflow/store/tracking/rest_store.py", line 175, in create_run
2023-04-06 07:50:54.225 [1680767451283182/start/1 (pid 2012)] response_proto = self._call_endpoint(CreateRun, req_body)
2023-04-06 07:50:54.225 [1680767451283182/start/1 (pid 2012)] File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/mlflow/store/tracking/rest_store.py", line 56, in _call_endpoint
2023-04-06 07:50:54.226 [1680767451283182/start/1 (pid 2012)] return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto)
2023-04-06 07:50:54.226 [1680767451283182/start/1 (pid 2012)] File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/mlflow/utils/databricks_utils.py", line 413, in get_databricks_host_creds
2023-04-06 07:50:54.226 [1680767451283182/start/1 (pid 2012)] config = provider.get_config()
2023-04-06 07:50:54.226 [1680767451283182/start/1 (pid 2012)] File "/databricks/python/lib/python3.9/site-packages/databricks_cli/configure/provider.py", line 134, in get_config
2023-04-06 07:50:54.226 [1680767451283182/start/1 (pid 2012)] raise InvalidConfigurationError.for_profile(None)
2023-04-06 07:50:54.226 [1680767451283182/start/1 (pid 2012)] databricks_cli.utils.InvalidConfigurationError: You haven't configured the CLI yet! Please configure by entering `/dbfs/FileStore/xxx/metaflow_mlflow_workflow.py configure`
2023-04-06 07:50:54.226 [1680767451283182/start/1 (pid 2012)] 
2023-04-06 07:50:54.226 [1680767451283182/start/1 (pid 2012)] Task failed.
2023-04-06 07:50:54.227 Workflow failed.
2023-04-06 07:50:54.227 Terminating 0 active tasks...
2023-04-06 07:50:54.227 Flushing logs...
    Step failure:
    Step start (task-id 1) failed.

Appartently,我做错了什么。 如何结合 Metaflow 和 MLflow 使其在 Databricks-Notebook 单元格中运行?

databricks mlflow netflix-metaflow
© www.soinside.com 2019 - 2024. All rights reserved.