我的最终目标是用
sklearn-ann 包中的转换器替换一些使用
kneighbors_graph
的方法。 sklearn-ann
中的所有方法都实现为 sklearn 兼容的转换器对象。然而,我试图替换的函数使用 kneighbors_graph(mode="connectivity", include_self=True)
并且我很难将 include_self=False
的距离输出转换为这种类型的连接矩阵。并非所有变压器对象都允许在包含 self 的情况下使用连接模式,但所有变压器对象都提供在不包含 self 的情况下进行距离计算的访问。
我能够从
kneighbors_graph(mode="connectivity", include_self=True)
重现kneighbors_graph(mode="distance", include_self=True)
(指nn_with_self
)。但是,我无法从 kneighbors_graph(mode="distance", include_self=False)
(指 nn_without_self
)重现它,这与 KNeighborsTransformer(mode="distance").fit_transform
的输出相同。
我看到
nn_without_self
是 nn_with_self
的超集,但我不知道后端算法如何选择保留哪些字段。
如何从下面的
nn_with_self
矩阵重新创建 nn_without_self
?
我尝试查看后端代码,但这就像类继承的开始,我发现自己同时翻阅多个文件,却在 GitHub 上失去了踪迹。
from sklearn.datasets import make_classification
from sklearn.neighbors import kneighbors_graph, KNeighborsTransformer
X, _ = make_classification(n_samples=10, n_features=4, n_classes=2, n_clusters_per_class=1, random_state=0)
n_neighbors=3
# Nearest neighbors
nn_with_self = kneighbors_graph(X, n_neighbors=n_neighbors, mode="distance", metric="euclidean", include_self=True,n_jobs=-1).todense()
nn_without_self = kneighbors_graph(X, n_neighbors=n_neighbors, mode="distance", metric="euclidean", include_self=False,n_jobs=-1).todense()
nn_from_transformer = KNeighborsTransformer(mode="distance", n_neighbors=n_neighbors, metric="euclidean", n_jobs=-1).fit_transform(X)
np.all(nn_from_transformer == nn_without_self)
# True
np.all(nn_with_self == nn_without_self)
# False
# Is `nn_with_self` symmetric?
np.allclose(nn_with_self,nn_with_self.T)
# False
# Is `nn_without_self` symmetric?
np.allclose(nn_without_self,nn_without_self.T)
# False
这是实际的数组:
nn_with_self
# matrix([[0. , 0.70550439, 0. , 0.20463097, 0. ,
# 0. , 0. , 0. , 0. , 0. ],
# [0. , 0. , 0. , 0.51947869, 0. ,
# 0. , 0. , 0. , 0. , 0.44145655],
# [0. , 0. , 0. , 0. , 0.50025504,
# 0. , 0. , 0. , 0.49481662, 0. ],
# [0.20463097, 0.51947869, 0. , 0. , 0. ,
# 0. , 0. , 0. , 0. , 0. ],
# [0. , 0. , 0.50025504, 0. , 0. ,
# 0. , 0. , 0. , 0.34132965, 0. ],
# [0. , 0.88867318, 0. , 0. , 0. ,
# 0. , 0. , 0. , 0. , 0.44956691],
# [0. , 0. , 1.10390699, 0. , 1.52953542,
# 0. , 0. , 0. , 0. , 0. ],
# [0. , 0. , 0. , 0. , 0. ,
# 3.62670755, 0. , 0. , 0. , 3.83571739],
# [0. , 0. , 0.49481662, 0. , 0.34132965,
# 0. , 0. , 0. , 0. , 0. ],
# [0. , 0.44145655, 0. , 0. , 0. ,
# 0.44956691, 0. , 0. , 0. , 0. ]])
nn_without_self
# matrix([[0. , 0.70550439, 0. , 0.20463097, 1.02852831,
# 0. , 0. , 0. , 0. , 0. ],
# [0.70550439, 0. , 0. , 0.51947869, 0. ,
# 0. , 0. , 0. , 0. , 0.44145655],
# [0. , 0. , 0. , 0. , 0.50025504,
# 0. , 1.10390699, 0. , 0.49481662, 0. ],
# [0.20463097, 0.51947869, 0. , 0. , 0. ,
# 0. , 0. , 0. , 0. , 0.95611187],
# [1.02852831, 0. , 0.50025504, 0. , 0. ,
# 0. , 0. , 0. , 0.34132965, 0. ],
# [0. , 0.88867318, 0. , 1.40547465, 0. ,
# 0. , 0. , 0. , 0. , 0.44956691],
# [0. , 0. , 1.10390699, 0. , 1.52953542,
# 0. , 0. , 0. , 1.59848513, 0. ],
# [0. , 4.1280709 , 0. , 0. , 0. ,
# 3.62670755, 0. , 0. , 0. , 3.83571739],
# [1.36553076, 0. , 0.49481662, 0. , 0.34132965,
# 0. , 0. , 0. , 0. , 0. ],
# [0. , 0.44145655, 0. , 0.95611187, 0. ,
# 0.44956691, 0. , 0. , 0. , 0. ]])
要使用 sklearn 包中的 KNeighborsTransformer 重现 kneighbors_graph(include_self=True),您可以按照以下步骤操作:
生成最近邻图,不包括自身距离。 修改图表以包含自身距离。 mode="distance" 的 KNeighborsTransformer 默认情况下不包含自身距离,因此您需要手动将这些距离添加到图表中。
使用 KNeighborsTransformer 生成没有自距离的距离图。 修改结果图以包含自身距离(为零)。
import numpy as np
from sklearn.datasets import make_classification
from sklearn.neighbors import kneighbors_graph, KNeighborsTransformer
from scipy.sparse import csr_matrix
# Generate sample data
X, _ = make_classification(n_samples=10, n_features=4, n_classes=2, n_clusters_per_class=1, random_state=0)
n_neighbors = 3
# Generate the distance matrix without self-distances
knn_transformer = KNeighborsTransformer(mode="distance", n_neighbors=n_neighbors, metric="euclidean", n_jobs=-1)
nn_without_self = knn_transformer.fit_transform(X)
# Convert the distance matrix to a dense format
nn_without_self = nn_without_self.todense()
# Create a matrix to store the final graph with self-distances included
nn_with_self = np.zeros_like(nn_without_self)
# Copy the distances from nn_without_self to nn_with_self
np.fill_diagonal(nn_with_self, 0) # Set diagonal to 0 for self-distances
for i in range(nn_with_self.shape[0]):
for j in range(n_neighbors):
neighbor_index = knn_transformer.kneighbors(X, n_neighbors=n_neighbors, return_distance=False)[i, j]
nn_with_self[i, neighbor_index] = nn_without_self[i, neighbor_index]
# Convert nn_with_self to a sparse matrix in the CSR format
nn_with_self_sparse = csr_matrix(nn_with_self)
# Print the resulting matrix
print("nn_with_self:\n", nn_with_self)
此方法手动将自距离添加到最近邻居图,复制 kneighbors_graph 的行为 include_self=True。