比numpy.where更多的内存效率选项？

Question

我有一个大型数组（数百万个元素），我需要根据几个不同的标准切出少数几个（几百个）。我目前正在使用np.where，其中包括：

for threshold in np.arange(0,1,.1):
    x=np.random.random(5000000)
    y=np.random.random(5000000)
    z=np.random.random(5000000)
    inds=np.where((x < threshold) & (y > threshold) & (z > threshold) & (z < threshold+0.1))

DoSomeJunk(a[inds], b[inds], c[inds])

然后使用ipts从各种数组中提取正确的点。但是，我在np.where行上获得了MemoryError。我在其他一些相关的帖子上看到，np.where可能是一个内存耗尽并复制数据。

是否有多个＆在那里意味着数据被多次复制？是否有一种更有效的方式来切割数据，这种方式的内存密集度较低，同时也保留了我想要的索引列表，以便以后可以在多个位置使用相同的切片？

请注意，我发布的这个示例实际上并没有生成错误，但结构类似于我所拥有的。

Answer 1

在每个条件中，您将创建一个与x，y和z大小相同的临时布尔数组。要优化它，您可以迭代地创建掩码：

for threshold in np.arange(0,1,.1):
    x=np.random.random(5000000)
    y=np.random.random(5000000)
    z=np.random.random(5000000)
    inds = x < threshold
    inds &= y > threshold
    inds &= z > threshold
    inds &= z < threshold+0.1

DoSomeJunk(a[inds], b[inds], c[inds])

对于此示例，这会将内存使用量从160 MB减少到40 MB。

比numpy.where更多的内存效率选项？

问题描述投票：2回答：1

1个回答

最新问题

比numpy.where更多的内存效率选项？

问题描述 投票：2回答：1

1个回答

最新问题

问题描述投票：2回答：1