为了划分训练测试数据:
X_train, X_test, y_train, y_test = train_test_split(X, y.iloc[:,1], test_size=0.3,random_state=seed, stratify=y)
但是当我跑步时,我看到此错误:(我写了x和y的大小)
Traceback (most recent call last): ... , in <module> X_train, X_test, y_train, y_test = train_test_split(X, y.iloc[:,1], test_size=0.3,random_state=seed, stratify=y) AttributeError: 'numpy.ndarray' object has no attribute 'iloc'
EDIT:形状为:
Shape(X)= (284807, 28)
Shape(y)= (284807,)
然后我用:
X_train, X_test, y_train, y_test = train_test_split(X, y[:,1], test_size=0.3,random_state=seed, stratify=y)
但是我看到了:
IndexError:数组的索引过多
如何解决此问题?
正如评论所建议,尝试将y.iloc[:,1]
替换为y
:
X_train, X_test, y_train, y_test = train_test_split(X,
y,
test_size=0.3,
random_state=seed)
编辑:如文档所建议,分层参数的大小必须为[2 * len(arrays)
,其中数组为X
或y
。
[iloc是pandas DataFrame和Series对象的方法
要访问元素,您可以使用带有索引和切片符号的ndarray或将ndarray转换为熊猫数据帧,如下所示:>
import pandas as pd df = pd.DataFrame(nda) y = df.iloc[:,1].to_numpy() #convert selected series from DataFrame to ndarray
DataFrame在处理数据方面提供了极大的灵活性。由于train_test_split将数组作为参数,因此可以使用DataFrame.to_numpy将DataFrame转换为ndarray