如何将PyTorch模型转移到Apple M1芯片上的GPU?

问题描述 投票:0回答:2

2022 年 5 月 18 日,PyTorch 宣布支持在 Mac 上进行 GPU 加速的 PyTorch 训练。

我按照以下过程在我的 Macbook Air M1 上设置 PyTorch(使用 miniconda)。

conda create -n torch-nightly python=3.8 

$ conda activate torch-nightly

$ pip install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu

我正在尝试执行 Udacity 深度学习课程中的脚本,可在此处获取。

该脚本使用以下代码将模型移动到 GPU:

G.cuda()
D.cuda()

但是,这不适用于 M1 芯片,因为没有 CUDA。

如果我们想将模型迁移到 M1 GPU,将张量迁移到 M1 GPU,并完全在 M1 GPU 上进行训练,我们应该做什么?


如果相关:

G
D
是 GAN 的判别器和生成器。

class Discriminator(nn.Module):

    def __init__(self, conv_dim=32):
        super(Discriminator, self).__init__()
        self.conv_dim = conv_dim
        # complete init function
        self.cv1 = conv(in_channels=3, out_channels=conv_dim, kernel_size=4, stride=2, padding=1, batch_norm=False)            # 32*32*3  -> 16*16*32
        self.cv2 = conv(in_channels=conv_dim, out_channels=conv_dim*2, kernel_size=4, stride=2, padding=1, batch_norm=True)    # 16*16*32 -> 8*8*64
        self.cv3 = conv(in_channels=conv_dim*2, out_channels=conv_dim*4, kernel_size=4, stride=2, padding=1, batch_norm=True)  # 8*8*64   -> 4*4*128
        self.fc1 = nn.Linear(in_features = 4*4*conv_dim*4, out_features = 1, bias=True)
        

    def forward(self, x):
        # complete forward function
        out = F.leaky_relu(self.cv1(x), 0.2)
        out = F.leaky_relu(self.cv2(x), 0.2)
        out = F.leaky_relu(self.cv3(x), 0.2)
        out = out.view(-1, 4*4*conv_dim*4)
        out = self.fc1(out)
        return out    

D = Discriminator(conv_dim)

class Generator(nn.Module):    
    def __init__(self, z_size, conv_dim=32):
        super(Generator, self).__init__()
        self.conv_dim = conv_dim
        self.z_size = z_size
        # complete init function
        self.fc1 = nn.Linear(in_features = z_size, out_features = 4*4*conv_dim*4)
        self.dc1 = deconv(in_channels = conv_dim*4, out_channels = conv_dim*2, kernel_size=4, stride=2, padding=1, batch_norm=True)
        self.dc2 = deconv(in_channels = conv_dim*2, out_channels = conv_dim, kernel_size=4, stride=2, padding=1, batch_norm=True)
        self.dc3 = deconv(in_channels = conv_dim, out_channels = 3, kernel_size=4, stride=2, padding=1, batch_norm=False)

    def forward(self, x):
        # complete forward function
        x = self.fc1(x)
        x = x.view(-1, conv_dim*4, 4, 4)
        x = F.relu(self.dc1(x))
        x = F.relu(self.dc2(x))
        x = F.tanh(self.dc3(x))
        return x

G = Generator(z_size=z_size, conv_dim=conv_dim)
pytorch metal apple-m1
2个回答
27
投票

这是我用过的:

if torch.backends.mps.is_available():
    mps_device = torch.device("mps")
    G.to(mps_device)
    D.to(mps_device)

同样,对于我想要移动到 M1 GPU 的所有张量,我使用了:

tensor_ = tensor_(mps_device)

有些操作尚未使用 MPS 实现,我们可能需要设置一些环境变量来使用 CPU 回退: 我在执行脚本过程中遇到的一个错误是

# NotImplementedError: The operator 'aten::_slow_conv2d_forward' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

为了解决这个问题,我设置了环境变量

PYTORCH_ENABLE_MPS_FALLBACK=1

conda env config vars set PYTORCH_ENABLE_MPS_FALLBACK=1
conda activate <test-env>

参考资料:

  1. https://pytorch.org/blog/introducing-accelerated-pytorch-training-on-mac/
  2. https://pytorch.org/docs/master/notes/mps.html
  3. https://sebastianraschka.com/blog/2022/pytorch-m1-gpu.html
  4. https://sebastianraschka.com/blog/2022/pytorch-m1-gpu.html
  5. https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#setting-environment-variables

3
投票

我想通过指定我们应该确保在安装 mps 版本时使用 M1 的本机 Python arm64 版本 (3.9.x) 来添加到上面的答案。如果您使用 conda,请执行以下操作:

import platform
print(platform.platform())

检查正在使用的是x86还是arm64。我遇到的两个错误是:

RuntimeError: Expected one of cpu, cuda, xpu, mkldnn, opengl, opencl, ideep, hip, ve, ort, mlc, xla, lazy, vulkan, meta, hpu device type at start of device string: mps` and `AttributeError: module 'torch.backends' has no attribute 'mps'

这是因为即使我安装了所需的 Pytorch 版本,我仍然运行 Python x86。

要解决这些问题,请执行以下操作:

  1. conda create -n py39_native python=3.9 -c conda-forge --override-channels
  2. conda 激活 py39_native
  3. conda config --env --set subdir osx-arm64

这对我有用,尽管 MPS 上的 pytorch 仍然非常新且有缺陷。希望快点好起来。

© www.soinside.com 2019 - 2024. All rights reserved.