尝试为我的
KFold
创建一个xgboost.cv
对象,我有
import pandas as pd
from sklearn.model_selection import KFold
df = pd.DataFrame([[1,2,3,4,5],[6,7,8,9,10]])
KF = KFold(n_splits=2)
kf = KF.split(df)
但是我好像只能列举一次:
for i, (train_index, test_index) in enumerate(kf):
print(f"Fold {i}")
for i, (train_index, test_index) in enumerate(kf):
print(f"Again_Fold {i}")
给出
的输出Fold 0
Fold 1
第二个枚举似乎在一个空对象上。
我可能从根本上理解错误,或者在某处搞砸了,但是有人可以解释这种行为吗?
[编辑,添加跟进问题] 此行为似乎导致将 KFold 对象传递给
xgboost.cv
设置 xgboost.cv(..., folds = KF.split(df))
出现索引超出范围错误。我的解决方法是重新创建元组列表
kf = []
for i, (train_index, test_index) in enumerate(KF.split(df)):
this_split = (list(train_index), list(test_index))
kf.append(this_split)
xgboost.cv(..., folds = kf)
寻找更智能的解决方案。
举个例子:
from sklearn.model_selection import KFold
import xgboost as xgb
import numpy as np
data = np.random.rand(5, 10) # 5 entities, each contains 10 features
label = np.random.randint(2, size=5) # binary target
dtrain = xgb.DMatrix(data, label=label)
param = {'max_depth': 2, 'eta': 1, 'objective': 'binary:logistic'}
如果我们运行您的代码:
KF = KFold(n_splits=2)
xgboost.cv(params= param,dtrain=dtrain, folds = KF.split(df))
我得到错误:
IndexError Traceback (most recent call last)
Cell In[51], line 2
1 KF = KFold(n_splits=2)
----> 2 xgboost.cv(params= param,dtrain=dtrain, folds = KF.split(df))
[..]
IndexError: list index out of range
在文档中,它要求一个 KFold 实例,所以你只需要做:
KF = KFold(n_splits=2)
xgb.cv(params= param,dtrain=dtrain, folds = KF)
可以查看源码,会调用split方法,所以不需要提供
KF.split(..)
.