PyTorch 复杂的矩阵向量乘法在 CPU 上速度很慢

Question

我发现在 CPU 上进行复值矩阵向量乘法时 pyTorch 比 numpy 慢得多：

一些注意事项：

这对我来说在多个系统中都是如此
内存不是问题
torch 上的复数乘法不会最大化核心（与其他三种情况不同）
火炬版本：2.5.1+cu124
numpy 版本：1.26.4
我验证了计算结果是一样的
两者都使用双精度（即实数为 64 位，复数为 128 位）
切换到浮动（torch.cfloat）会使速度稍微快一些，但速度并不快

也许我配置错误？

生成上述图的代码：

import torch
import numpy as np
import matplotlib.pyplot as plt
import time

maxn = 3000
nrep = 100

def conv(M,latype):
    if latype=='numpy':
        return np.array(M)
    if latype.startswith('torch,'):
        return torch.tensor(M,device=latype[7:])

def multtest(A,b):
    t0 = time.time()
    for i in range(nrep):
        b = A@b
    t1 = time.time()
    return (t1-t0)/nrep

ns = np.array(np.linspace(100,maxn,100),dtype=int)
numpyts = np.zeros(len(ns))
torchts = np.zeros(len(ns))

fig,axes = plt.subplots(1,2)
for ax,dtype in zip(axes,['real','complex']):
    Aorig = np.random.rand(maxn,maxn)
    borig = np.random.rand(maxn)
    if dtype == 'complex':
        Aorig = Aorig + 1.j*np.random.rand(maxn,maxn)
        borig = borig + 1.j*np.random.rand(maxn)

    for latype in ['numpy','torch, cpu']:
        A = conv(Aorig,latype)
        b = conv(borig,latype)
        ts = np.zeros(len(ns))
        for i,n in enumerate(ns):
            ts[i] = multtest(A[:n,:n],b[:n])
        ax.plot(ns,ts,label=latype)

    ax.legend()
    ax.set_title(dtype)
    ax.set_xlabel('vector/matrix size')
    ax.set_ylabel('mean matrix-vector mult time (sec)')

fig.tight_layout()
plt.show()

Answer 1

当我运行代码时，我得到了不同的图：

火炬：2.3.1
numpy：1.26.4
CUDA：12.2
NVIDIA 驱动程序：535.183.01 (Ubuntu)

PyTorch 复杂的矩阵向量乘法在 CPU 上速度很慢

问题描述投票：0回答：1

1个回答

最新问题

PyTorch 复杂的矩阵向量乘法在 CPU 上速度很慢

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1