KNeighborsClassifier 预测抛出“预期为 2D 数组，却得到了 1D 数组”

Question

我正在编写一个图像相似度算法。我正在使用 cv2.calcHist 来提取图像特征。创建功能后，我将它们作为 numpy.float64 列表保存到 json 文件中：

list(numpy.float64(features))

，这是一个多维向量嵌入。

在第二步中，我从 json 加载数据并为 sklearn KNeighborsClassifier 做好准备。

import numpy as np
import json
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics.pairwise import cosine_similarity


with open('data.json') as f:
    jsonData = json.load(f)

X = []
y = []

for image in jsonData['images']:
    embeddingData = image['histogram']
    X.append(embeddingData)
    y.append(image['classification'])

X = np.array(X)
y = np.array(y)

#split dataset into train and test data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1, stratify=y)

print('Shape of X_train:')
print(X_train.shape)
print('Shape of X_test:')
print(X_test.shape)
print('Shape of y_train:')
print(y_train.shape)

# Create KNN classifier
knn = KNeighborsClassifier(n_neighbors = 1, metric=cosine_similarity)
# Fit the classifier to the data
knn.fit(X_train, y_train)

#show predictions on the test data
y_pred = knn.predict(X_test)

当我运行此代码时，出现以下错误

y_pred = knn.predict(X_test)

ValueError: Expected 2D array, got 1D array instead:
array=[1.13707140e-01 9.81128156e-01 2.89475545e-02 ... 0.00000000e+00
 5.02811105e-04 1.15502894e-01].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

形状部分的输出为：

Shape of X_train:
(36, 4096)
Shape of X_test:
(9, 4096)
Shape of y_train:
(36,)

我尝试使用重塑建议

y_pred = knn.predict(X_test.reshape(-1, 1))

，这帮助了其他遇到同样问题的人，就像这篇文章一样，但这让我受益匪浅

ValueError: X has 1 features, but KNeighborsClassifier is expecting 4096 features as input.

4096 是我的直方图特征的维度。

我也尝试重塑 X_train 以使其再次与 X_test 匹配：

knn.fit(X_train.reshape(-1, 1), y_train)

，但这会导致

ValueError: Found input variables with inconsistent numbers of samples: [147456, 36]

首先，我尝试了一种基于 knn 示例的稍微不同的方法，他们在 iris 数据集上训练模型，但 knn.fit 不会接受具有相同 2D/1D 值误差的训练数据。然后我从 pyimagesearch 找到了这个例子，这几乎就是我想做的，除了我有一个关于 json 文件的中间步骤。然而，在我的情况下，json 是必要的，因为我想稍后添加其他嵌入，并且不想重新计算所有内容。

我不明白的是为什么 knn.fit 接受来自 X_train 的数据，但 knn.predict 不接受来自 X_test 的数据，这些数据都是以相同的方式生成的。为什么一种情况下的错误已修复，而另一种情况下则没有？

我已经尝试过 this、this 和 this 帖子中建议的解决方案，但如上所述，重塑的解决方案在我的情况下不起作用。当我尝试添加额外的括号时，如下所示：

y_pred = knn.predict([X_test])

，我收到以下错误：

ValueError: Found array with dim 3. KNeighborsClassifier expected <= 2.

我也尝试寻找其他示例，但发现很少有使用类似数据结构的示例，而且我找到的示例也没有帮助。

我也发现this问题有同样的问题，但接受的答案并不是问题的解决方案。

这是我读取的 json 文件。

Answer 1

由于指令 knn.predict(X_test) 上出现错误消息“预期为 2D 数组，改为 1D 数组”，因此可以逻辑地认为 X_test 没有良好的尺寸，但正如您所说，X_test 确实具有正确的尺寸尺寸，所以乍一看似乎没有意义。

事实上，在这种特殊情况下，错误消息有些误导，因为问题隐藏在上面 2 行 knn 的定义中，特别是其度量：

knn = KNeighborsClassifier(n_neighbors = 1, metric=cosine_similarity)

如果您更改“余弦”的度量，它将起作用。

不是很直观，但在doc中，您会找到可能用于度量的字符串，并且它还表示您可以按照您尝试的方式使用函数，但该函数应该采用两个一维数组作为输入并返回标量：

metric：str 或可调用，默认='minkowski' 用于距离的度量计算。默认值为“minkowski”，这会产生标准 p = 2 时的欧几里德距离。请参阅文档 scipy.spatial.distance 和 distance_metrics 中列出的指标有效的度量值。 [...] 如果 metric 是一个可调用函数，则它需要两个表示 1D 的数组向量作为输入，并且必须返回一个指示距离的值在这些向量之间 [...]

但是如果你看一下 cosine_similarity() 的定义，它说这个函数接受两个 2D 数组并返回一个 2D 数组。

这就是为什么您收到错误消息“预期 2D，得到 1D”。错误消息并不直接链接到向predict() 提供的内容，而是与向predict() 调用的度量函数提供的内容直接链接！

KNeighborsClassifier 预测抛出“预期为 2D 数组，却得到了 1D 数组”

问题描述投票：0回答：1

1个回答

最新问题

KNeighborsClassifier 预测抛出“预期为 2D 数组，却得到了 1D 数组”

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1