索引后保存numpy数组要慢得多

问题描述 投票:0回答:1

我遇到了一个问题,即在索引后保存 numpy 数组会导致保存速度慢得多。下面是一个最小的可重现示例:

import time
import numpy as np

def mre(save_path):
    array = np.zeros((245, 233, 6))

    start = time.time()
    for i in range(1000):
        with open(save_path + '/array1_' + str(i), "wb") as file:
            np.save(file, array)
    end = time.time()
    print(f"No indexing: {end - start}s")

    start = time.time()
    for i in range(1000):
        array2 = array[:,:,[0,1,2,3,4,5]]
        with open(save_path + '/array2_' + str(i), "wb") as file:
            np.save(file, array2)
    end = time.time()
    print(f"With indexing: {end - start}s")
    print("Arrays are equal: ", np.array_equal(array, array2))

结果是:

No indexing: 3.168931007385254s
With indexing: 9.889702320098877s
Arrays are equal:  True

所以根据 numpy 的说法,数组是相等的,但最终的保存时间仍然要慢得多。有谁知道这是为什么?

python numpy io numpy-ndarray
1个回答
0
投票

那是因为您正在执行额外的操作:

array[:,:,[0,1,2,3,4,5]]
不是免费的。这里的主要问题是索引正在创建数据的副本,这意味着循环内的新内存分配。

import numpy as np

array = np.zeros((245, 233, 6))
array2 = array
print(array is array2) # This is True
print(id(array), id(array2))

array2 = array[:,:,[0,1,2,3,4,5]] # New memory allocation
print(array is array2) # This is False
print(id(array), id(array2))
© www.soinside.com 2019 - 2024. All rights reserved.