如何将 anndata 像矩阵绘制为灰色 2d numpy 数组

问题描述 投票:0回答:1

输入数据

  • mtx:“
    • mtx.todense()[0:2,0:8] = 矩阵([[4050, 24, 21, 3, 0, 3, 2, 1], [1437, 17, 17, 3, 0, 3, 2, 1]])
  • cspdata2d:numpy.ndarray
    • cspdata2d.shape = (96629557, 2)
    • cspdata2d[0:8,:] = 数组([[43426, 1414], [ 5496, 20015], [23193, 19957], [ 2252, 28571], [10910, 22382], [ 1385, 19958], [ 5508, 22828], [25183, 22533]], dtype=int32)

输出

  • outimg: np.zeros(shape=cspdata2d.max(axis=0), dtype=np.int16)
    • outimg.shape = (51968, 40141)

我想将

mtx.todense()[1:2,:]
的值与
x,y = cspdata2d[index]
绘制到
outimg
,其中
index
来自 mtx 的第二个维度。

它就像

scanpy.pl.spatial
,但我想绘制到像素。而且 scanpy 在巨大的数据集上速度很慢。

numpy plot scipy anndata
1个回答
0
投票

这是一个相对高效的 Numpy 解决方案:

import numpy as np
from scipy.sparse.coo import coo_matrix

# Compute `mtx.toarray()[1:2,:]`
# Time: 0.76 s
selected_row = 1
select = mtx.row == selected_row
assert np.array_equiv(mtx.row[select], selected_row)
slice_data = mtx.data[select]
slice_rows = np.zeros(np.count_nonzero(select))
slice_cols = mtx.col[select]
mtx_sparse_slice = coo_matrix((slice_data, (slice_rows, slice_cols)), shape=(1,mtx.shape[1]))
mtx_dense_slice = mtx_sparse_slice.toarray()

# Compute `x,y = cspdata2d[index]` where index is from the 2nd dimensions of mtx
# Assume cspdata2d contains [x,y] values in each row and not [y,x] ones
# Assume the zeros values are not interesting so they are not included
# Time: 0.42 s
index = mtx_dense_slice[0,:]
nnz_index = np.nonzero(index)[0]
x = cspdata2d[nnz_index,0]
y = cspdata2d[nnz_index,1]

# Compute outimg
# Time: 0.75 s
outimg = np.zeros(shape=cspdata2d.max(axis=0), dtype=np.int16) 
outimg[x, y] = nnz_index.astype(np.int16)

这是使用随机值测试此设置的设置:

mtx_nnz = 357_558_594
mtx_nrows = 30_562
mtx_ncols = 96_629_557
mtx_data = np.random.randint(0, mtx_ncols, size=mtx_nnz).astype(np.int64) # Apparently values <= mtx_ncols
mtx_rows = np.random.randint(0, mtx_nrows, size=mtx_nnz).astype(np.int32)
mtx_cols = np.random.randint(0, mtx_ncols, size=mtx_nnz).astype(np.int32)
mtx = coo_matrix((mtx_data, (mtx_rows, mtx_cols)), shape=(mtx_nrows, mtx_ncols))
del mtx_data, mtx_rows, mtx_ncols # Save some memory space

cspdata2d = np.empty(shape=(mtx_ncols,2), dtype=np.int32)
cspdata2d[:,0] = np.random.randint(0, 51_968, size=mtx_ncols)
cspdata2d[:,1] = np.random.randint(0, 40_141, size=mtx_ncols)

在我的机器上,整体执行时间约为 2 秒,考虑到输入数据的大小(>=10 GiB 数据读取/写入),对于(顺序)Numpy 代码来说,这相对较快(尽管不是最佳)。

© www.soinside.com 2019 - 2024. All rights reserved.