我该如何正确使用torch.compile？

Question

我目前正在尝试使用 pytorch 2.0 来提高我的项目的训练性能。我听说 torch.compile 可能会增强某些模型。

所以我的问题（目前）很简单；我应该如何将 torch.compile 与大型模型一起使用？

比如，我应该像这样使用torch.model吗？

class BigModel(nn.Module):
    def __init__(self, ...):
        super(BigModel, self).__init__()
        self.model = nn.Sequential(
            SmallBlock(), 
            SmallBlock(), 
            SmallBlock(), 
            ...
        )
        ...

class SmallBlock(nn.Module):
    def __init__(self, ...):
        super(SmallBlock, self).__init__()
        self.model = nn.Sequential(
            ...some small model...
        )

model = BigModel()
model_opt = torch.compile(model)

，或者像这样？

class BigModel(nn.Module):
    def __init__(self, ...):
        super(BigModel, self).__init__()
        self.model = nn.Sequential(
            SmallBlock(), 
            SmallBlock(), 
            SmallBlock(), 
            ...
        )
        ...

class SmallBlock(nn.Module):
    def __init__(self, ...):
        super(SmallBlock, self).__init__()
        self.model = nn.Sequential(
            ...some small model...
        )
        self.model = torch.compile(self.model)

model = BigModel()
model_opt = torch.compile(model)

总结一下，

应该编译每一层吗？或者 torch.compile 自动执行此操作？
正确使用 torch.compile 有什么技巧吗？

说实话，我都尝试过，但没有什么区别..

而且，它并没有显着加速，我只是检查了我的模型的加速率约为 5 ~ 10%。

Answer 1

PyTorch 开发人员在这里，但你的问题有很多变数

您使用什么样的硬件？ A100 或 A10G GPU 上的加速效果最为显着
如果是，您启用了张量核心吗？
编译发生在第一批期间，您的批量大小是多少？如果很小，那么使用 mode=reduce-overhead 确实会让事情变得更快，因为它启用了 CUDA 图，这有助于减少启动小内核的开销
您应该选择编译您实际运行的整个模型，实际上，我们有一些实用程序来允许或禁止编译子图，您可以在此处查看https://pytorch.org/docs/master/_dynamo。 html

Answer 2

torch.compile 的默认模式似乎不起作用，但它有另一种模式可以真正加速你的模型。 ”“” torch.compile(${yourmodel}, mode="reduce-overhead") ”“”

我该如何正确使用torch.compile？

问题描述投票：0回答：2

2个回答

最新问题

我该如何正确使用torch.compile？

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2