将SciPy稀疏矩阵的上对角线(包括对角线)存储在本地文件中,将文件加载到共享内存(shm)中,并从shm重建稀疏矩阵的简单方法?
ChatGPT 4o 提供了一个看似过于复杂的解决方案,由三个部分组成:
data
、indices
和 indptr
;有必要吗?
import numpy as np
import scipy.sparse as sp
from multiprocessing import shared_memory
# Example matrix creation
matrix = sp.random(100, 100, density=0.1, format='csr')
# Step 1: Extract and save the upper triangular part (including the diagonal)
upper_triangular_matrix = sp.triu(matrix)
sp.save_npz('upper_triangular_matrix.npz', upper_triangular_matrix)
# Step 2: Load the matrix from the file
loaded_matrix = sp.load_npz('upper_triangular_matrix.npz')
# Convert the matrix to shared memory
# Flatten the data, indices, and indptr arrays for shared memory storage
data = loaded_matrix.data
indices = loaded_matrix.indices
indptr = loaded_matrix.indptr
# Create shared memory blocks for each array
shm_data = shared_memory.SharedMemory(create=True, size=data.nbytes)
shm_indices = shared_memory.SharedMemory(create=True, size=indices.nbytes)
shm_indptr = shared_memory.SharedMemory(create=True, size=indptr.nbytes)
# Copy the arrays into the shared memory blocks
np.copyto(np.ndarray(data.shape, dtype=data.dtype, buffer=shm_data.buf), data)
np.copyto(np.ndarray(indices.shape, dtype=indices.dtype, buffer=shm_indices.buf), indices)
np.copyto(np.ndarray(indptr.shape, dtype=indptr.dtype, buffer=shm_indptr.buf), indptr)
# Access the shared memory blocks
shared_data = np.ndarray(data.shape, dtype=data.dtype, buffer=shm_data.buf)
shared_indices = np.ndarray(indices.shape, dtype=indices.dtype, buffer=shm_indices.buf)
shared_indptr = np.ndarray(indptr.shape, dtype=indptr.dtype, buffer=shm_indptr.buf)
# Create a shared memory CSR matrix
shared_csr_matrix = sp.csr_matrix((shared_data, shared_indices, shared_indptr), shape=loaded_matrix.shape)
# Accessing the shared memory matrix
print(shared_csr_matrix)
# Clean up shared memory
shm_data.close()
shm_data.unlink()
shm_indices.close()
shm_indices.unlink()
shm_indptr.close()
shm_indptr.unlink()
最终我的矩阵并不是真正稀疏,所以使用 SciPy 的稀疏矩阵格式并没有真正帮助节省内存。我的矩阵的主要属性是它是对称的。
ChatGPT 4o 给出了处理对称矩阵的基本步骤,就像它是一个完整矩阵一样,以节省内存;这基本上就是我的要求:
第 1 步:将上对角矩阵保存到文件中
首先,生成并保存对称矩阵的上对角部分(如果尚未保存)。
import numpy as np import os from multiprocessing import shared_memory filename = "upper_diag.npy" # Function to create a symmetric matrix and save the upper-diagonal part def create_and_save_upper_diag(filename, size=5): A = np.random.rand(size, size) A = (A + A.T) / 2 # Make A symmetric upper_diag_A = np.triu(A) np.save(filename, upper_diag_A) # Save to a file # Check if the file exists, if not create and save the matrix if not os.path.exists(filename): create_and_save_upper_diag(filename)
第 2 步:从文件加载到 SHM 中
获得文件后,您可以将其加载到共享内存中。
# Function to load a matrix from a .npy file into shared memory def load_to_shm(filename): # Load matrix from file upper_diag_A = np.load(filename) # Create shared memory block shm = shared_memory.SharedMemory(create=True, size=upper_diag_A.nbytes) # Create a NumPy array backed by shared memory upper_diag_A_shm = np.ndarray(upper_diag_A.shape, dtype=upper_diag_A.dtype, buffer=shm.buf) upper_diag_A_shm[:] = upper_diag_A[:] # Copy data to shared memory return upper_diag_A_shm, shm # Load the matrix into shared memory upper_diag_A_shm, shm = load_to_shm(filename)
访问矩阵的示例
您可以使用矩阵的对称性来访问前面讨论的矩阵。
def get_element(matrix, i, j): if i <= j: return matrix[i, j] else: return matrix[j, i] # Example of accessing elements print(get_element(upper_diag_A_shm, 1, 4)) # Access an element
清理
使用完共享内存后,正确清理它很重要。
# Cleanup upper_diag_A_shm = None shm.close() shm.unlink() # Remove the shared memory block
此方法将文件操作与 Python 中的共享内存处理相结合,使其适合在不同运行或进程中重用矩阵的应用程序,从而通过避免重新计算来节省时间和内存。