我使用numpy-2.1.2-cp313-cp313-win_amd64。当我尝试通过 memmap 加载数组时,数组形状和数据已损坏。最小可重现示例如下:
>>> a = np.arange(65536)
>>> a
array([ 0, 1, 2, ..., 65533, 65534, 65535])
>>> np.save('f.npy', a)
# When I load an array via np.load, it's OK.
>>> b = np.load('f.npy')
>>> b
array([ 0, 1, 2, ..., 65533, 65534, 65535])
>>> a.dtype
dtype('int64')
# When I use memmap, the shape of array is corrupted and some elements were added into beginning
>>> c = np.memmap('f.npy', dtype=np.int64, mode='r')
>>> c
memmap([ 379676406402707, 7166182912910098550, 4064846277420656498,
..., 65533, 65534,
65535])
>>> c.shape
(65552,)
# When I specified shape, the same elements were added into beginning, and elements on the tail were cropped.
>>> c = np.memmap('f.npy', dtype=np.int64, mode='r', shape=a.shape)
>>> c
memmap([ 379676406402707, 7166182912910098550, 4064846277420656498,
..., 65517, 65518,
65519])
当我在这个例子中使用
np.load('f.npy', mmap_mode='r')
时,这是可以的,但在实际数据中会出现ValueError: Cannot load file contains pickled data whenallow_pickle=False。如果我切换 allow_pickle=True
,则会出现另一个错误:UnpicklingError: Failed to interpret file 'f.npy' as a pickle
。
所以,我想使用memmap。我怎样才能正确地做到这一点?
由
np.save
和 np.memmap
创建的文件不兼容。这两种保存/加载方式可以应用于 numpy 数组:
>>> a = np.arange(65536)
>>> np.save('f.npy', a)
>>> b = np.load('f.npy')
>>> b
array([ 0, 1, 2, ..., 65533, 65534, 65535])
>>> b.shape
(65536,)
>>> c = np.load('f.npy', mmap_mode='r')
>>> c
memmap([ 0, 1, 2, ..., 65533, 65534, 65535])
>>> c.shape
(65536,)
>>> a = np.arange(65536)
>>> outFile = np.memmap('f.npy', dtype=a.dtype, mode='w+', shape=a.shape)
>>> outFile[:] = a
>>> outFile.flush()
>>> inFile = np.memmap('f.npy', dtype=np.int64, mode='r')
>>> inFile
memmap([ 0, 1, 2, ..., 65533, 65534, 65535])
>>> inFile.shape
(65536,)
当您混合使用这两种方式时,结果将会损坏。