[尝试使用SVM绘制具有179列(作为特征)的大小为1200的数据集的ROC曲线会出现以下错误:
'数组的索引太多'
代码:
from sklearn.svm import SVC
svclassifier = SVC(kernel='linear')
svm = svclassifier.fit(X_train, Y_train).decision_function(X_test)
Y_pred = svclassifier.predict(X_test)
ns_predt = [0 for _ in range(len(Y_test))]
Y_predt = Y_pred[:,1]
Traceback (most recent call last) IndexError
<ipython-input-92-62de12967d46> in <module>
----> 1 Y_predt = Y_pred[:,1]
IndexError: too many indices for array
您遇到的错误与“ Y_pred [:,1]”中的请求索引和可用索引有关。您正在请求第1列的所有行(冒号':')(使用Python的零索引,实际上是第2列)。但是,Y_pred是一个numpy一维数组(即无列)。
我不确定您要在ns_predt = [0 for _ in range(len(Y_test))]
或Y_predt = Y_pred[:,1]
中做什么,所以我无法给您其他选择。但是问题很明显:您正在请求不存在的列。
可以使用以下代码轻松复制该问题:
import pandas as pd
import numpy as np
import pdb
from sklearn.svm import SVC
print('Creating fake data..')
X_train = pd.DataFrame(np.random.randint(0,1000,size=(100, 4)), columns=list('ABCD'))
Y_train = pd.DataFrame(np.random.randint(0,10,size=(100, 1)), columns=list('E'))
X_test = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
Y_test = pd.DataFrame(np.random.randint(0,10,size=(100, 1)), columns=list('E'))
print('Initializing classifier')
svclassifier = SVC(kernel='linear')
print('Training the model')
svm = svclassifier.fit(X_train, Y_train).decision_function(X_test)
print('Predicting outcome')
Y_pred = svclassifier.predict(X_test)
print('... ? ...')
ns_predt = [0 for _ in range(len(Y_test))]
try:
Y_predt = Y_pred[:,1]
except:
print('I failed...')
pdb.set_trace()