numpy的高效加权矢量距离计算

Question

我想计算两组点之间的欧氏距离平方，inputs和testing。 inputs通常是一个大小为〜（200，N）的真实数组，而testing通常为〜（1e8，N），N大约为10.距离应该在N的每个维度中缩放，所以我将聚合表达scale[j]*(inputs[i,j] - testing[ii,j])**2（其中scale是缩放矢量）超过N次。我试图尽可能快地做到这一点，特别是因为N可能很大。我的第一个测试是

def old_version (inputs, testing, x0):
    nn, d1 = testing.shape
    n, d1 = inputs.shape
    b = np.zeros((n, nn))
    for d in xrange(d1):
        b += x0[d] * (((np.tile(inputs[:, d], (nn, 1)) -
             np.tile (testing[:, d], (n, 1)).T))**2).T
return b

没什么太花哨的。然后我尝试使用scipy.spatial.distance.cdist，虽然我仍然需要遍历它以获得正确的缩放

def new_version (inputs, testing, x0):
    # import scipy.spatial.distance as dist
    nn, d1 = testing.shape
    n, d1 = inputs.shape
    b = np.zeros ((n, nn))

    for d in xrange(d1):
        b += x0[d] * dist.cdist(inputs[:, d][:, None], 
             testing[:, d][:, None], 'sqeuclidean')
    return b

似乎new_version更好地扩展（N> 1000），但我不确定我在这里走得更快。进一步的想法非常感谢！

Answer 1

这段代码给了我一个10倍的实现，尝试一下：

x = np.random.randn(200, 10)
y = np.random.randn(1e5, 10)
scale = np.abs(np.random.randn(1, 10))
scale_sqrt = np.sqrt(scale)

dist_map = dist.cdist(x*scale_sqrt, y*scale_sqrt, 'sqeuclidean')

这些是测试结果：

在[135]中：％timeit suggested_version（输入，测试，x0）

1个循环，每个循环最好为3：341 ms

在[136]中：％timeit op_version（输入，测试，x00）（注意：x00是x0的重塑）

1个循环，最佳3：3.37秒每循环

只要确保比你去更大的N，你不会记忆力低下。它可以真的减慢速度。

numpy的高效加权矢量距离计算

问题描述投票：2回答：1

1个回答

最新问题

numpy的高效加权矢量距离计算

问题描述 投票：2回答：1

1个回答

最新问题

问题描述投票：2回答：1