如何修改azureml python sdk v2来服务自定义用例?

问题描述 投票:0回答:1

我想根据我的用例修改

azureml.data.dataset_factory.register_pandas_dataframe()
,以便除了默认情况下的
relative_path_with_guid
之外它还会返回
registered_dataset

默认的

azureml.data.dataset_factory.register_pandas_dataframe()
函数定义是

@staticmethod
    @track(_get_logger, custom_dimensions={'app_name': 'TabularDataset'}, activity_type=_PUBLIC_API)
    def register_pandas_dataframe(dataframe, target, name, description=None, tags=None, show_progress=True):
        """Create a dataset from pandas dataframe.

        :param dataframe: Required, in memory dataframe to be uploaded.
        :type dataframe: pandas.DataFrame
        :param target: Required, the datastore path where the dataframe parquet data will be uploaded to.
            A guid folder will be generated under the target path to avoid conflict.
        :type target: typing.Union[azureml.data.datapath.DataPath, azureml.core.datastore.Datastore,
            tuple(azureml.core.datastore.Datastore, str)]
        :param name: Required, the name of the registered dataset.
        :type name: str
        :param description: Optional. A text description of the dataset. Defaults to None.
        :type description: str
        :param tags: Optional. Dictionary of key value tags to give the dataset. Defaults to None.
        :type tags: dict[str, str]
        :param show_progress: Optional, indicates whether to show progress of the upload in the console.
            Defaults to be True.
        :type show_progress: bool
        :return: The registered dataset.
        :rtype: azureml.data.TabularDataset
        """
        import pandas as pd
        from azureml.data.datapath import DataPath
        from uuid import uuid4

        console = get_progress_logger(show_progress)
        console("Validating arguments.")
        _check_type(dataframe, "dataframe", pd.core.frame.DataFrame)
        _check_type(name, "name", str)
        datastore, relative_path = parse_target(target, True)
        console("Arguments validated.")

        guid = uuid4()
        relative_path_with_guid = "%s/%s/" % (relative_path, guid)
        console("Successfully obtained datastore reference and path.")

        console("Uploading file to {}".format(relative_path_with_guid))
        sanitized_df = _sanitize_pandas(dataframe)
        dflow = dataprep().read_pandas_dataframe(df=sanitized_df, in_memory=True)
        target_directory_path = DataReference(datastore=datastore).path(relative_path_with_guid)
        dflow.write_to_parquet(directory_path=target_directory_path).run_local()
                
        console("Successfully uploaded file to datastore.")

        console("Creating and registering a new dataset.")
        datapath = DataPath(datastore, relative_path_with_guid)
        saved_dataset = TabularDatasetFactory.from_parquet_files(datapath)
        registered_dataset = saved_dataset.register(datastore.workspace, name,
                                                    description=description,
                                                    tags=tags,
                                                    create_new_version=True)
        console("Successfully created and registered a new dataset.")

        return registered_dataset

我了解到更改源代码不是一个好的做法,我应该在开发模式下更改包。即使有一个选项可以做到这一点,我也不知道在哪里可以找到 azureml-sdk 包的 setup.py 。我遇到错误时

pip install azureml-sdk -e /path/to/azureml-dev/folder

ERROR: File "setup.py" not found. Directory cannot be installed in editable mode: /path
/to/azureml-dev/folder

我想知道是否有人在调整 azureml-sdk 时做过类似的实验。您是如何解决 setup.py 问题的?

azure-machine-learning-service azureml-python-sdk azuremlsdk
1个回答
0
投票

由于azureml sdk-v2是一个闭源Python模块,因此它的代码无法修改。

© www.soinside.com 2019 - 2024. All rights reserved.