为什么 sklearn 的 KFold 只能枚举一次(也在 xgboost.cv 中使用它)?

问题描述 投票:0回答:1

尝试为我的

KFold
创建一个
xgboost.cv
对象,我有

import pandas as pd
from sklearn.model_selection import KFold

df = pd.DataFrame([[1,2,3,4,5],[6,7,8,9,10]])

KF = KFold(n_splits=2)
kf = KF.split(df)

但是我好像只能列举一次:

for i, (train_index, test_index) in enumerate(kf):
    print(f"Fold {i}")

for i, (train_index, test_index) in enumerate(kf):
    print(f"Again_Fold {i}")

给出

的输出
Fold 0
Fold 1

第二个枚举似乎在一个空对象上。

我可能从根本上理解错误,或者在某处搞砸了,但是有人可以解释这种行为吗?

[编辑,添加跟进问题] 此行为似乎导致将 KFold 对象传递给

xgboost.cv
设置
xgboost.cv(..., folds = KF.split(df))
出现索引超出范围错误。我的解决方法是重新创建元组列表

kf = []
for i, (train_index, test_index) in enumerate(KF.split(df)):
    this_split = (list(train_index), list(test_index))
    kf.append(this_split)

xgboost.cv(..., folds = kf)

寻找更智能的解决方案。

python scikit-learn xgboost cross-validation k-fold
1个回答
0
投票

举个例子:

from sklearn.model_selection import KFold
import xgboost as xgb
import numpy as np

data = np.random.rand(5, 10)  # 5 entities, each contains 10 features
label = np.random.randint(2, size=5)  # binary target
dtrain = xgb.DMatrix(data, label=label)

param = {'max_depth': 2, 'eta': 1, 'objective': 'binary:logistic'}

如果我们运行您的代码:

KF = KFold(n_splits=2)
xgboost.cv(params= param,dtrain=dtrain, folds = KF.split(df))

我得到错误:

IndexError                                Traceback (most recent call last)
Cell In[51], line 2
      1 KF = KFold(n_splits=2)
----> 2 xgboost.cv(params= param,dtrain=dtrain, folds = KF.split(df))
[..]

IndexError: list index out of range

在文档中,它要求一个 KFold 实例,所以你只需要做:

KF = KFold(n_splits=2)
xgb.cv(params= param,dtrain=dtrain, folds = KF)

可以查看源码,会调用split方法,所以不需要提供

KF.split(..)
.

© www.soinside.com 2019 - 2024. All rights reserved.