Hydra 不允许在 Azure ML 中使用任何命令行脚本

Question

我正在尝试将数据从一个组件传递到 Azure ML 管道中的下一个组件。我可以用简单的代码来完成。

我有 2 个组件，我将它们定义如下：

components_dir = "."
prep = load_component(source=f"{components_dir}/preprocessing_config.yml")
middle = load_component(source=f"{components_dir}/middle_config.yml")

然后我定义一个管道如下：

@pipeline(
    display_name="test_pipeline3",
    tags={"authoring": "sdk"},
    description="test pipeline to test things just like all other test pipelines."
)
def data_pipeline(
    # raw_data: Input,
    compute_train_node: str,
):
   
    prep_node = prep()

    prep_node.outputs.Y_df= Output(type="uri_folder", mode = 'rw_mount', path="path/testing/") 
    prep_node.outputs.S_df= Output(type="uri_folder", mode = 'rw_mount', path="path/testing/")


    transform_node = middle(Y_df=prep_node.outputs.Y_df,
                            S_df=prep_node.outputs.S_df)

准备节点有一个脚本，其中涉及 Hydra 从配置文件获取参数。该脚本还有一个配置文件，可在命令行中启动脚本，如下所示：

  python preprocessing_script.py
  --Y_df ${{outputs.Y_df}} 
  --S_df ${{outputs.S_df}}

我尝试在准备脚本的主函数中获取 Y_df.path 和 S_df.path 的值，如下所示：

@hydra.main(version_base=None, config_path=".", config_name="config_file")
def main(cfg: DictConfig):

parser = argparse.ArgumentParser("prep")
parser.add_argument("--Y_df", type=str, help="Path of prepped data")
parser.add_argument("--S_df", type=str, help="Path of prepped data")
args = parser.parse_args()

# Call the preprocessing function with Hydra configurations
df1,df2 = processing_func(cfg.data_name,cfg.prod_filter)
df1.to_csv(Path(cfg.Y_df) / "Y_df.csv")
df2.to_csv(Path(cfg.S_df) / "S_df.csv")

当我运行所有这些时，我在准备组件本身中收到一个错误：

Execution failed. User process 'python' exited with status code 2. Please check log file 'user_logs/std_log.txt' for error details. Error: /bin/bash: /azureml-envs/azureml_bbh34278yrnrfuehn78340/lib/libtinfo.so.6: no version information available (required by /bin/bash)
usage: data_processing.py [--help] [--hydra-help] [--version]
                          [--cfg {job,hydra,all}] [--resolve]
                          [--package PACKAGE] [--run] [--multirun]
                          [--shell-completion] [--config-path CONFIG_PATH]
                          [--config-name CONFIG_NAME]
                          [--config-dir CONFIG_DIR]
                          [--experimental-rerun EXPERIMENTAL_RERUN]
                          [--info [{all,config,defaults,defaults-tree,plugins,searchpath}]]
                          [overrides ...]
data_processing.py: error: unrecognized arguments: --Y_df --S_df /mnt/azureml/cr/j/ffyh7fs984ryn8f733ff3/cap/data-capability/wd/S_df

当不涉及 Hydra 时，代码运行良好，并且数据在组件之间传输，但当涉及 Hydra 时，我收到此错误。为什么会这样？

编辑：以下是用于准备的数据组件配置文件：

$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
type: command

name: preprocessing24
display_name: preprocessing24


outputs:
  Y_df:
    type: uri_folder

  S_df:
    type: uri_folder



code: ./preprocessing_final


environment: azureml:datapipeline-environment:4

command: >-
  python data_processing.py

数据预处理配置文件只包含一堆变量，但我添加了另外 2 个变量，它们是：

Y_df:
  random_txt

S_df:
  random_txt

数据处理脚本的主要功能如上所述。

Answer 1

好吧，这就是发生的事情。

CLI 脚本中的此表示法不起作用

  python preprocessing_script.py
  --Y_df ${{outputs.Y_df}} 
  --S_df ${{outputs.S_df}}

那是因为九头蛇不喜欢这个符号（我认为）

相反，这个符号有效：

  python data_processing.py '+Y_df=${{outputs.Y_df}}' '+S_df=${{outputs.S_df}}'

它的作用是将这两个新变量 - Y_df 和 S_df 添加到配置文件中这些变量可以在程序中访问，就像配置文件中的所有其他变量一样，通过执行

cfg.Y_df

或

cfg.S_df

Hydra 不允许在 Azure ML 中使用任何命令行脚本

问题描述投票：0回答：1

1个回答

最新问题

Hydra 不允许在 Azure ML 中使用任何命令行脚本

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1