我需要使用 WSL 在我的桌面计算机上创建并运行 Azure 空间分析容器。我遵循了本教程:https://learn.microsoft.com/en-us/azure/ai-services/computer-vision/spatial-analysis-container?tabs=desktop-machine。根据 IoT 中心的说法,一切都应该运行良好。但我没有得到任何输出,当查看空间分析模块的日志时,我经常看到这个错误:
2024-03-06T19:46:45.429562642Z <warning> 93 [VIDEO_INGESTER-cognitiveservices_vision_spatialanalysis_1.store.spatialanalysisgraph.videosource] cognitiveservices_vision_spatialanalysis_1 Error: Failed to allocate shared buffer. Skipping frame.
2024-03-06T19:46:45.501593175Z <warning> 93 [VIDEO_INGESTER-cognitiveservices_vision_spatialanalysis_1.store.spatialanalysisgraph.videosource] cognitiveservices_vision_spatialanalysis_1 Failed to get CUDA handle: cudaIpcGetMemHandle failed with error 2
2024-03-06T19:46:45.502484545Z <error> 93 [VIDEO_INGESTER-cognitiveservices_vision_spatialanalysis_1.store.spatialanalysisgraph.videosource] cognitiveservices_vision_spatialanalysis_1 Cannot create cuda shared buffer. Size: 6684672
我不知道该怎么办,这也是来自 nvidia-smi 的报告:
| NVIDIA-SMI 530.30.02 Driver Version: 527.99 CUDA Version: 12.0 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 L... On | 00000000:01:00.0 On | N/A |
| N/A 55C P8 16W / 115W| 2609MiB / 6144MiB | 28% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+````
I tried restarting the modules and whole IoT edge. Also checked the connectivity to IoT hub and that should be also fine.
Thanks in advance for your help!
该错误是由于 GPU 内存导致的,导致缓冲区大小为 6.6 MB。如果您的 NVIDIA GeForce RTX 3060 的可用内存有限,分配此缓冲区可能会导致问题。确保系统中有足够的空间并满足空间分析容器要求。
以下是安装和运行空间分析容器的分步指南:
安装 NVIDIA CUDA 工具包和 Nvidia 显卡驱动程序:
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub
sudo add-apt-repository "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
sudo apt-get update
sudo apt-get -y install cuda
sudo reboot
安装 Docker CE 和 nvidia-docker2:
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl gnupg-agent software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y docker-ce nvidia-docker2
sudo systemctl restart docker
启用 NVIDIA MPS:
sudo nvidia-smi --compute-mode=EXCLUSIVE_PROCESS
echo "SHELL=/bin/bash" > /tmp/nvidia-mps-cronjob
sudo chown root:root /tmp/nvidia-mps-cronjob
sudo mv /tmp/nvidia-mps-cronjob /etc/cron.d/
sudo chown root:root /tmp/nvidia-mps.service
sudo mv /tmp/nvidia-mps.service /etc/systemd/system/
sudo systemctl --now enable nvidia-mps.service
在主机上配置 Azure IoT Edge:
创建 Azure IoT 中心实例:
sudo az login
sudo az account set --subscription "<name or ID of Azure Subscription>"
sudo az group create --name "<resource-group-name>" --location "<your-region>"
sudo az iot hub create --name "<iothub-group-name>" --sku S1 --resource-group "<resource-group-name>"
sudo az iot hub device-identity create --hub-name "<iothub-name>" --device-id "<device-name>" --edge-enabled
安装 Azure IoT Edge:
sudo cp ./microsoft-prod.list /etc/apt/sources.list.d/
curl https://packages.microsoft.com/keys/microsoft.asc | gpg --dearmor > microsoft.gpg
sudo cp ./microsoft.gpg /etc/apt/trusted.gpg.d/
sudo apt-get update
sudo apt-get install iotedge=1.1* libiothsm-std=1.1
注册 IoT Edge 设备:
sudo az iot hub device-identity connection-string show --device-id <device-id> --hub-name <hub-name>
sudo nano /etc/iotedge/config.yaml # Replace ADD DEVICE CONNECTION STRING HERE with the connection string
sudo systemctl restart iotedge
部署空间分析容器:
部署容器:
sudo az login
sudo az extension add --name azure-iot
sudo az iot edge set-modules --hub-name "<iothub-name>" --device-id "<device-name>" --content DeploymentManifest.json --subscription "<name or ID of Azure Subscription>"