这不是树的数量,因为我只训练了25棵。它也不是变量的值。这可以通过括号中的值的比例来证明,这没有意义,因为记录了许多变量。我检查了文档,没有任何解释。有任何想法或其他参考吗?
df1 <- xgb.train(data = X_train_dmat,
eta = 0.1,
max_depth = 5,
nround=25,
subsample = 0.5,
colsample_bytree = 0.5,
booster = 'gbtree',
objective = 'reg:squarederror',
nthread = 3
)
xgb.plot.multi.trees(model = df1,
features_keep = 5,
use.names=FALSE,
plot_width = NULL,
plot_height = NULL,
render = TRUE
)
查看源代码https://github.com/dmlc/xgboost/blob/master/R-package/R/xgb.plot.multi.trees.R#L94,这是在树中创建节点的部分:
nodes.dt <- tree.matrix[
, .(Quality = sum(Quality))
, by = .(abs.node.position, Feature)
][, .(Text = paste0(Feature[1:min(length(Feature), features_keep)],
" (",
format(Quality[1:min(length(Quality), features_keep)], digits=5),
")") %>%
paste0(collapse = "\n"))
, by = abs.node.position]
具体来说,这是写这些数字的代码:
format(Quality[1:min(length(Quality), features_keep)], digits=5)
因此,这些数字表示每个节点的质量,我认为反映了该节点对数据进行划分的适当程度。自从我处理这些模型以来已经有一段时间了,我从不精明,所以我不确定自己的解释。如果您想进一步了解质量的含义,可以在源代码中进行更深入的研究,以了解如何计算质量。