我想将
mtx.todense()[1:2,:]
的值与 x,y = cspdata2d[index]
绘制到 outimg
,其中 index
来自 mtx 的第二个维度。
它就像
scanpy.pl.spatial
,但我想绘制到像素。而且 scanpy 在巨大的数据集上速度很慢。
这是一个相对高效的 Numpy 解决方案:
import numpy as np
from scipy.sparse.coo import coo_matrix
# Compute `mtx.toarray()[1:2,:]`
# Time: 0.76 s
selected_row = 1
select = mtx.row == selected_row
assert np.array_equiv(mtx.row[select], selected_row)
slice_data = mtx.data[select]
slice_rows = np.zeros(np.count_nonzero(select))
slice_cols = mtx.col[select]
mtx_sparse_slice = coo_matrix((slice_data, (slice_rows, slice_cols)), shape=(1,mtx.shape[1]))
mtx_dense_slice = mtx_sparse_slice.toarray()
# Compute `x,y = cspdata2d[index]` where index is from the 2nd dimensions of mtx
# Assume cspdata2d contains [x,y] values in each row and not [y,x] ones
# Assume the zeros values are not interesting so they are not included
# Time: 0.42 s
index = mtx_dense_slice[0,:]
nnz_index = np.nonzero(index)[0]
x = cspdata2d[nnz_index,0]
y = cspdata2d[nnz_index,1]
# Compute outimg
# Time: 0.75 s
outimg = np.zeros(shape=cspdata2d.max(axis=0), dtype=np.int16)
outimg[x, y] = nnz_index.astype(np.int16)
这是使用随机值测试此设置的设置:
mtx_nnz = 357_558_594
mtx_nrows = 30_562
mtx_ncols = 96_629_557
mtx_data = np.random.randint(0, mtx_ncols, size=mtx_nnz).astype(np.int64) # Apparently values <= mtx_ncols
mtx_rows = np.random.randint(0, mtx_nrows, size=mtx_nnz).astype(np.int32)
mtx_cols = np.random.randint(0, mtx_ncols, size=mtx_nnz).astype(np.int32)
mtx = coo_matrix((mtx_data, (mtx_rows, mtx_cols)), shape=(mtx_nrows, mtx_ncols))
del mtx_data, mtx_rows, mtx_ncols # Save some memory space
cspdata2d = np.empty(shape=(mtx_ncols,2), dtype=np.int32)
cspdata2d[:,0] = np.random.randint(0, 51_968, size=mtx_ncols)
cspdata2d[:,1] = np.random.randint(0, 40_141, size=mtx_ncols)
在我的机器上,整体执行时间约为 2 秒,考虑到输入数据的大小(>=10 GiB 数据读取/写入),对于(顺序)Numpy 代码来说,这相对较快(尽管不是最佳)。