我遇到了一个问题,即在索引后保存 numpy 数组会导致保存速度慢得多。下面是一个最小的可重现示例:
import time
import numpy as np
def mre(save_path):
array = np.zeros((245, 233, 6))
start = time.time()
for i in range(1000):
with open(save_path + '/array1_' + str(i), "wb") as file:
np.save(file, array)
end = time.time()
print(f"No indexing: {end - start}s")
start = time.time()
for i in range(1000):
array2 = array[:,:,[0,1,2,3,4,5]]
with open(save_path + '/array2_' + str(i), "wb") as file:
np.save(file, array2)
end = time.time()
print(f"With indexing: {end - start}s")
print("Arrays are equal: ", np.array_equal(array, array2))
结果是:
No indexing: 3.168931007385254s
With indexing: 9.889702320098877s
Arrays are equal: True
所以根据 numpy 的说法,数组是相等的,但最终的保存时间仍然要慢得多。有谁知道这是为什么?
那是因为您正在执行额外的操作:
array[:,:,[0,1,2,3,4,5]]
不是免费的。这里的主要问题是索引正在创建数据的副本,这意味着循环内的新内存分配。
import numpy as np
array = np.zeros((245, 233, 6))
array2 = array
print(array is array2) # This is True
print(id(array), id(array2))
array2 = array[:,:,[0,1,2,3,4,5]] # New memory allocation
print(array is array2) # This is False
print(id(array), id(array2))