如何在sklearn中使用`KNeighborsTransformer`重现`kneighbors_graph(include_self=True)`?

问题描述 投票:0回答:1

我的最终目标是用

sklearn-ann 包
中的转换器替换一些使用 kneighbors_graph 的方法。
sklearn-ann
中的所有方法都实现为 sklearn 兼容的转换器对象。然而,我试图替换的函数使用
kneighbors_graph(mode="connectivity", include_self=True)
并且我很难将
include_self=False
的距离输出转换为这种类型的连接矩阵。并非所有变压器对象都允许在包含 self 的情况下使用连接模式,但所有变压器对象都提供在不包含 self 的情况下进行距离计算的访问。

我能够从

kneighbors_graph(mode="connectivity", include_self=True)
重现
kneighbors_graph(mode="distance", include_self=True)
(指
nn_with_self
)。但是,我无法从
kneighbors_graph(mode="distance", include_self=False)
(指
nn_without_self
)重现它,这与
KNeighborsTransformer(mode="distance").fit_transform
的输出相同。

我看到

nn_without_self
nn_with_self
的超集,但我不知道后端算法如何选择保留哪些字段。

如何从下面的

nn_with_self
矩阵重新创建
nn_without_self

我尝试查看后端代码,但这就像类继承的开始,我发现自己同时翻阅多个文件,却在 GitHub 上失去了踪迹。

from sklearn.datasets import make_classification
from sklearn.neighbors import kneighbors_graph, KNeighborsTransformer

X, _ = make_classification(n_samples=10, n_features=4, n_classes=2, n_clusters_per_class=1, random_state=0)
n_neighbors=3

# Nearest neighbors
nn_with_self = kneighbors_graph(X, n_neighbors=n_neighbors, mode="distance", metric="euclidean", include_self=True,n_jobs=-1).todense()
nn_without_self = kneighbors_graph(X, n_neighbors=n_neighbors, mode="distance", metric="euclidean", include_self=False,n_jobs=-1).todense()
nn_from_transformer = KNeighborsTransformer(mode="distance", n_neighbors=n_neighbors, metric="euclidean", n_jobs=-1).fit_transform(X)

np.all(nn_from_transformer == nn_without_self)
# True

np.all(nn_with_self == nn_without_self)
# False

# Is `nn_with_self` symmetric?
np.allclose(nn_with_self,nn_with_self.T)
# False

# Is `nn_without_self` symmetric?
np.allclose(nn_without_self,nn_without_self.T)
# False

这是实际的数组:

nn_with_self
# matrix([[0.        , 0.70550439, 0.        , 0.20463097, 0.        ,
#          0.        , 0.        , 0.        , 0.        , 0.        ],
#         [0.        , 0.        , 0.        , 0.51947869, 0.        ,
#          0.        , 0.        , 0.        , 0.        , 0.44145655],
#         [0.        , 0.        , 0.        , 0.        , 0.50025504,
#          0.        , 0.        , 0.        , 0.49481662, 0.        ],
#         [0.20463097, 0.51947869, 0.        , 0.        , 0.        ,
#          0.        , 0.        , 0.        , 0.        , 0.        ],
#         [0.        , 0.        , 0.50025504, 0.        , 0.        ,
#          0.        , 0.        , 0.        , 0.34132965, 0.        ],
#         [0.        , 0.88867318, 0.        , 0.        , 0.        ,
#          0.        , 0.        , 0.        , 0.        , 0.44956691],
#         [0.        , 0.        , 1.10390699, 0.        , 1.52953542,
#          0.        , 0.        , 0.        , 0.        , 0.        ],
#         [0.        , 0.        , 0.        , 0.        , 0.        ,
#          3.62670755, 0.        , 0.        , 0.        , 3.83571739],
#         [0.        , 0.        , 0.49481662, 0.        , 0.34132965,
#          0.        , 0.        , 0.        , 0.        , 0.        ],
#         [0.        , 0.44145655, 0.        , 0.        , 0.        ,
#          0.44956691, 0.        , 0.        , 0.        , 0.        ]])

nn_without_self
# matrix([[0.        , 0.70550439, 0.        , 0.20463097, 1.02852831,
#          0.        , 0.        , 0.        , 0.        , 0.        ],
#         [0.70550439, 0.        , 0.        , 0.51947869, 0.        ,
#          0.        , 0.        , 0.        , 0.        , 0.44145655],
#         [0.        , 0.        , 0.        , 0.        , 0.50025504,
#          0.        , 1.10390699, 0.        , 0.49481662, 0.        ],
#         [0.20463097, 0.51947869, 0.        , 0.        , 0.        ,
#          0.        , 0.        , 0.        , 0.        , 0.95611187],
#         [1.02852831, 0.        , 0.50025504, 0.        , 0.        ,
#          0.        , 0.        , 0.        , 0.34132965, 0.        ],
#         [0.        , 0.88867318, 0.        , 1.40547465, 0.        ,
#          0.        , 0.        , 0.        , 0.        , 0.44956691],
#         [0.        , 0.        , 1.10390699, 0.        , 1.52953542,
#          0.        , 0.        , 0.        , 1.59848513, 0.        ],
#         [0.        , 4.1280709 , 0.        , 0.        , 0.        ,
#          3.62670755, 0.        , 0.        , 0.        , 3.83571739],
#         [1.36553076, 0.        , 0.49481662, 0.        , 0.34132965,
#          0.        , 0.        , 0.        , 0.        , 0.        ],
#         [0.        , 0.44145655, 0.        , 0.95611187, 0.        ,
#          0.44956691, 0.        , 0.        , 0.        , 0.        ]])
python arrays numpy matrix nearest-neighbor
1个回答
0
投票

要使用 sklearn 包中的 KNeighborsTransformer 重现 kneighbors_graph(include_self=True),您可以按照以下步骤操作:

生成最近邻图,不包括自身距离。 修改图表以包含自身距离。 mode="distance" 的 KNeighborsTransformer 默认情况下不包含自身距离,因此您需要手动将这些距离添加到图表中。

使用 KNeighborsTransformer 生成没有自距离的距离图。 修改结果图以包含自身距离(为零)。

import numpy as np
from sklearn.datasets import make_classification
from sklearn.neighbors import kneighbors_graph, KNeighborsTransformer
from scipy.sparse import csr_matrix

# Generate sample data
X, _ = make_classification(n_samples=10, n_features=4, n_classes=2, n_clusters_per_class=1, random_state=0)
n_neighbors = 3

# Generate the distance matrix without self-distances
knn_transformer = KNeighborsTransformer(mode="distance", n_neighbors=n_neighbors, metric="euclidean", n_jobs=-1)
nn_without_self = knn_transformer.fit_transform(X)

# Convert the distance matrix to a dense format
nn_without_self = nn_without_self.todense()

# Create a matrix to store the final graph with self-distances included
nn_with_self = np.zeros_like(nn_without_self)

# Copy the distances from nn_without_self to nn_with_self
np.fill_diagonal(nn_with_self, 0)  # Set diagonal to 0 for self-distances

for i in range(nn_with_self.shape[0]):
    for j in range(n_neighbors):
        neighbor_index = knn_transformer.kneighbors(X, n_neighbors=n_neighbors, return_distance=False)[i, j]
        nn_with_self[i, neighbor_index] = nn_without_self[i, neighbor_index]

# Convert nn_with_self to a sparse matrix in the CSR format
nn_with_self_sparse = csr_matrix(nn_with_self)

# Print the resulting matrix
print("nn_with_self:\n", nn_with_self)

此方法手动将自距离添加到最近邻居图,复制 kneighbors_graph 的行为 include_self=True。

© www.soinside.com 2019 - 2024. All rights reserved.