我有一个数据集,在没有权重的情况下建立 xgbTree 模型没有问题,但是一旦我加入权重 -- 即使权重都是 1 -- 模型就不会收敛。我得到的结果是 Something is wrong; all the RMSE metric values are missing:
错误,而当我打印警告时,我得到的是 In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, ... :There were missing values in resampled performance measures.
作为最后一条信息。
这是一个驱动链接 到包含信息的RData文件--它太大,无法打印,而且较小的样本并不总是能重现错误。
它包含3个对象。input_x
, input_y
和 wts
--最后一个只是一个1s的向量,但它最终应该能够接受区间(0,1)上的数字,最好是这样。我使用的代码如下所示。请注意权重参数旁边产生错误的注释。
nrounds<-1000
tune_grid <- expand.grid(
nrounds = seq(from = 200, to = nrounds, by = 50),
eta = c(0.025, 0.05, 0.1, 0.3),
max_depth = c(2, 3, 4, 5),
gamma = 0,
colsample_bytree = 1,
min_child_weight = 1,
subsample = 1
)
tune_control <- caret::trainControl(
method = "cv",
number = 3,
verboseIter = FALSE,
allowParallel = TRUE
)
xgb_tune <- caret::train(
x = input_x,
y = input_y,
weights = wts, # If I remove this line, the code works fine. When included, even if just 1s, it throws an error.
trControl = tune_control,
tuneGrid = tune_grid,
method = "xgbTree",
verbose = TRUE
)
根据 功能源 权重参数称为 wts
.
行。
if (!is.null(wts))
xgboost::setinfo(x, 'weight', wts)
运转
xgb_tune <- caret::train(
x = input_x,
y = input_y,
wts = wts,
trControl = tune_control,
tuneGrid = tune_grid,
method = "xgbTree",
verbose = TRUE
)
应产生预期的结果。