Azure 机器学习工作区和计算集群是通过 Terraform 创建的。有没有办法运行 bash 脚本创建环境并安装 Miniconda 软件以使用 terraform 或 CLI 进行集群?或者使用 setup.sh 配置文件创建 azure 机器学习计算集群?
您对解决这个问题有什么建议吗?
我厌倦了向现有 AML 计算集群运行 bash 脚本,但我能够成功地配置需求
您可以使用 Terraform 和 Azure CLI 创建 Azure 机器学习工作区和计算集群。然后,您可以运行 bash 脚本来设置环境并在集群上安装 Miniconda 软件。
我的地形配置:
provider "azurerm" {
features {}
}
data "azurerm_client_config" "current" {}
resource "azurerm_resource_group" "example" {
name = "demovk-rg"
location = "west europe"
tags = {
"stage" = "test"
}
}
resource "azurerm_application_insights" "example" {
name = "demovk-ai"
location = azurerm_resource_group.example.location
resource_group_name = azurerm_resource_group.example.name
application_type = "web"
}
resource "azurerm_key_vault" "example" {
name = "demovks-kv"
location = azurerm_resource_group.example.location
resource_group_name = azurerm_resource_group.example.name
tenant_id = data.azurerm_client_config.current.tenant_id
sku_name = "standard"
purge_protection_enabled = true
}
resource "azurerm_storage_account" "example" {
name = "demoksasb"
location = azurerm_resource_group.example.location
resource_group_name = azurerm_resource_group.example.name
account_tier = "Standard"
account_replication_type = "LRS"
}
resource "azurerm_machine_learning_workspace" "example" {
name = "demok-mlw"
location = azurerm_resource_group.example.location
resource_group_name = azurerm_resource_group.example.name
application_insights_id = azurerm_application_insights.example.id
key_vault_id = azurerm_key_vault.example.id
storage_account_id = azurerm_storage_account.example.id
identity {
type = "SystemAssigned"
}
}
resource "azurerm_virtual_network" "example" {
name = "demovk-vnet"
address_space = ["10.1.0.0/16"]
location = azurerm_resource_group.example.location
resource_group_name = azurerm_resource_group.example.name
}
resource "azurerm_subnet" "example" {
name = "demovk-subnet"
resource_group_name = azurerm_resource_group.example.name
virtual_network_name = azurerm_virtual_network.example.name
address_prefixes = ["10.1.0.0/24"]
}
resource "azurerm_machine_learning_compute_cluster" "test" {
name = "demovsb"
location = azurerm_resource_group.example.location
vm_priority = "LowPriority"
vm_size = "Standard_DS2_v2"
machine_learning_workspace_id = azurerm_machine_learning_workspace.example.id
subnet_resource_id = azurerm_subnet.example.id
scale_settings {
min_node_count = 0
max_node_count = 1
scale_down_nodes_after_idle_duration = "PT30S" # 30 seconds
}
identity {
type = "SystemAssigned"
}
}
输出:
现在创建一个 bash 脚本,其中包含设置环境和安装 Miniconda 的命令。这是
setup.sh
脚本的简单示例:
#!/bin/bash
# Install Miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
bash miniconda.sh -b -p $HOME/miniconda
echo 'export PATH="$HOME/miniconda/bin:$PATH"' >> $HOME/.bashrc
source $HOME/.bashrc
# Create a Conda environment
conda create -n myenv python=3.7
conda activate myenv
# Install your required packages
conda install -y package1 package2
创建集群后,您可以使用 Terraform 中的 Azure CLI 在计算集群上执行
setup.sh
脚本。这是一个例子:
az ml compute list -g <resource_group_name> -w <workspace_name> --query "[?name=='<cluster_name>'].id" --output tsv | xargs -I {} az ml compute node setup -g <resource_group_name> -w <workspace_name> --target-id {} --code setup.sh
此命令检索计算目标(集群)的 ID 并将
setup.sh
脚本附加到其上。