对现有 AML 计算集群运行 bash 脚本

问题描述 投票:0回答:1

Azure 机器学习工作区和计算集群是通过 Terraform 创建的。有没有办法运行 bash 脚本创建环境并安装 Miniconda 软件以使用 terraform 或 CLI 进行集群?或者使用 setup.sh 配置文件创建 azure 机器学习计算集群?

您对解决这个问题有什么建议吗?

azure automation terraform command-line-interface infrastructure-as-code
1个回答
0
投票

我厌倦了向现有 AML 计算集群运行 bash 脚本,但我能够成功地配置需求

您可以使用 Terraform 和 Azure CLI 创建 Azure 机器学习工作区和计算集群。然后,您可以运行 bash 脚本来设置环境并在集群上安装 Miniconda 软件。

我的地形配置:

provider "azurerm" {
    features {}

}

data "azurerm_client_config" "current" {}


resource "azurerm_resource_group" "example" {
  name     = "demovk-rg"
  location = "west europe"
  tags = {
    "stage" = "test"
  }
}

resource "azurerm_application_insights" "example" {
  name                = "demovk-ai"
  location            = azurerm_resource_group.example.location
  resource_group_name = azurerm_resource_group.example.name
  application_type    = "web"
}

resource "azurerm_key_vault" "example" {
  name                = "demovks-kv"
  location            = azurerm_resource_group.example.location
  resource_group_name = azurerm_resource_group.example.name
  tenant_id           = data.azurerm_client_config.current.tenant_id

  sku_name = "standard"

  purge_protection_enabled = true
}

resource "azurerm_storage_account" "example" {
  name                     = "demoksasb"
  location                 = azurerm_resource_group.example.location
  resource_group_name      = azurerm_resource_group.example.name
  account_tier             = "Standard"
  account_replication_type = "LRS"
}

resource "azurerm_machine_learning_workspace" "example" {
  name                    = "demok-mlw"
  location                = azurerm_resource_group.example.location
  resource_group_name     = azurerm_resource_group.example.name
  application_insights_id = azurerm_application_insights.example.id
  key_vault_id            = azurerm_key_vault.example.id
  storage_account_id      = azurerm_storage_account.example.id

  identity {
    type = "SystemAssigned"
  }
}

resource "azurerm_virtual_network" "example" {
  name                = "demovk-vnet"
  address_space       = ["10.1.0.0/16"]
  location            = azurerm_resource_group.example.location
  resource_group_name = azurerm_resource_group.example.name
}

resource "azurerm_subnet" "example" {
  name                 = "demovk-subnet"
  resource_group_name  = azurerm_resource_group.example.name
  virtual_network_name = azurerm_virtual_network.example.name
  address_prefixes     = ["10.1.0.0/24"]
}

resource "azurerm_machine_learning_compute_cluster" "test" {
  name                          = "demovsb"
  location                      = azurerm_resource_group.example.location
  vm_priority                   = "LowPriority"
  vm_size                       = "Standard_DS2_v2"
  machine_learning_workspace_id = azurerm_machine_learning_workspace.example.id
  subnet_resource_id            = azurerm_subnet.example.id

  scale_settings {
    min_node_count                       = 0
    max_node_count                       = 1
    scale_down_nodes_after_idle_duration = "PT30S" # 30 seconds
  }

  identity {
    type = "SystemAssigned"
  }
}

输出:

enter image description here

现在创建一个 bash 脚本,其中包含设置环境和安装 Miniconda 的命令。这是

setup.sh
脚本的简单示例:

#!/bin/bash

# Install Miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
bash miniconda.sh -b -p $HOME/miniconda
echo 'export PATH="$HOME/miniconda/bin:$PATH"' >> $HOME/.bashrc
source $HOME/.bashrc

# Create a Conda environment
conda create -n myenv python=3.7
conda activate myenv

# Install your required packages
conda install -y package1 package2

创建集群后,您可以使用 Terraform 中的 Azure CLI 在计算集群上执行

setup.sh
脚本。这是一个例子:

az ml compute list -g <resource_group_name> -w <workspace_name> --query "[?name=='<cluster_name>'].id" --output tsv | xargs -I {} az ml compute node setup -g <resource_group_name> -w <workspace_name> --target-id {} --code setup.sh

此命令检索计算目标(集群)的 ID 并将

setup.sh
脚本附加到其上。

© www.soinside.com 2019 - 2024. All rights reserved.