2022 年 5 月 18 日,PyTorch 宣布支持在 Mac 上进行 GPU 加速的 PyTorch 训练。
我按照以下过程在我的 Macbook Air M1 上设置 PyTorch(使用 miniconda)。
conda create -n torch-nightly python=3.8
$ conda activate torch-nightly
$ pip install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu
我正在尝试执行 Udacity 深度学习课程中的脚本,可在此处获取。
该脚本使用以下代码将模型移动到 GPU:
G.cuda()
D.cuda()
但是,这不适用于 M1 芯片,因为没有 CUDA。
如果我们想将模型迁移到 M1 GPU,将张量迁移到 M1 GPU,并完全在 M1 GPU 上进行训练,我们应该做什么?
如果相关:
G
和 D
是 GAN 的判别器和生成器。
class Discriminator(nn.Module):
def __init__(self, conv_dim=32):
super(Discriminator, self).__init__()
self.conv_dim = conv_dim
# complete init function
self.cv1 = conv(in_channels=3, out_channels=conv_dim, kernel_size=4, stride=2, padding=1, batch_norm=False) # 32*32*3 -> 16*16*32
self.cv2 = conv(in_channels=conv_dim, out_channels=conv_dim*2, kernel_size=4, stride=2, padding=1, batch_norm=True) # 16*16*32 -> 8*8*64
self.cv3 = conv(in_channels=conv_dim*2, out_channels=conv_dim*4, kernel_size=4, stride=2, padding=1, batch_norm=True) # 8*8*64 -> 4*4*128
self.fc1 = nn.Linear(in_features = 4*4*conv_dim*4, out_features = 1, bias=True)
def forward(self, x):
# complete forward function
out = F.leaky_relu(self.cv1(x), 0.2)
out = F.leaky_relu(self.cv2(x), 0.2)
out = F.leaky_relu(self.cv3(x), 0.2)
out = out.view(-1, 4*4*conv_dim*4)
out = self.fc1(out)
return out
D = Discriminator(conv_dim)
class Generator(nn.Module):
def __init__(self, z_size, conv_dim=32):
super(Generator, self).__init__()
self.conv_dim = conv_dim
self.z_size = z_size
# complete init function
self.fc1 = nn.Linear(in_features = z_size, out_features = 4*4*conv_dim*4)
self.dc1 = deconv(in_channels = conv_dim*4, out_channels = conv_dim*2, kernel_size=4, stride=2, padding=1, batch_norm=True)
self.dc2 = deconv(in_channels = conv_dim*2, out_channels = conv_dim, kernel_size=4, stride=2, padding=1, batch_norm=True)
self.dc3 = deconv(in_channels = conv_dim, out_channels = 3, kernel_size=4, stride=2, padding=1, batch_norm=False)
def forward(self, x):
# complete forward function
x = self.fc1(x)
x = x.view(-1, conv_dim*4, 4, 4)
x = F.relu(self.dc1(x))
x = F.relu(self.dc2(x))
x = F.tanh(self.dc3(x))
return x
G = Generator(z_size=z_size, conv_dim=conv_dim)
这是我用过的:
if torch.backends.mps.is_available():
mps_device = torch.device("mps")
G.to(mps_device)
D.to(mps_device)
同样,对于我想要移动到 M1 GPU 的所有张量,我使用了:
tensor_ = tensor_(mps_device)
有些操作尚未使用 MPS 实现,我们可能需要设置一些环境变量来使用 CPU 回退: 我在执行脚本过程中遇到的一个错误是
# NotImplementedError: The operator 'aten::_slow_conv2d_forward' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
为了解决这个问题,我设置了环境变量
PYTORCH_ENABLE_MPS_FALLBACK=1
conda env config vars set PYTORCH_ENABLE_MPS_FALLBACK=1
conda activate <test-env>
参考资料:
我想通过指定我们应该确保在安装 mps 版本时使用 M1 的本机 Python arm64 版本 (3.9.x) 来添加到上面的答案。如果您使用 conda,请执行以下操作:
import platform
print(platform.platform())
检查正在使用的是x86还是arm64。我遇到的两个错误是:
RuntimeError: Expected one of cpu, cuda, xpu, mkldnn, opengl, opencl, ideep, hip, ve, ort, mlc, xla, lazy, vulkan, meta, hpu device type at start of device string: mps` and `AttributeError: module 'torch.backends' has no attribute 'mps'
这是因为即使我安装了所需的 Pytorch 版本,我仍然运行 Python x86。
要解决这些问题,请执行以下操作:
这对我有用,尽管 MPS 上的 pytorch 仍然非常新且有缺陷。希望快点好起来。