我正在使用xgboost
库来训练二进制分类器。我想通过将噪声添加到权重(例如整体中树的叶子节点处的值)来防止训练算法产生数据泄漏。为此,我需要检索每棵树的权重并对其进行修改。
我可以在Booster对象上使用dump_model
或trees_to_dataframe
来查看权重,我将其定义为
model = xgb.Booster(params, [dtrain])
后一种方法返回熊猫数据框
Tree Node ID Feature Split Yes No Missing Gain Cover
0 0 0 0-0 tenure 17.0 0-1 0-2 0-1 671.161072 1595.500
1 0 1 0-1 InternetService_Fiber optic 1.0 0-3 0-4 0-3 343.489227 621.125
2 0 2 0-2 InternetService_Fiber optic 1.0 0-5 0-6 0-5 293.603149 974.375
3 0 3 0-3 tenure 4.0 0-7 0-8 0-7 95.604340 333.750
4 0 4 0-4 TotalCharges 120.0 0-9 0-10 0-9 27.897919 287.375
5 0 5 0-5 Contract_Two year 1.0 0-11 0-12 0-11 32.057739 512.625
6 0 6 0-6 tenure 60.0 0-13 0-14 0-13 120.693176 461.750
7 0 7 0-7 TechSupport_No internet service 1.0 0-15 0-16 0-15 37.326447 149.750
8 0 8 0-8 TechSupport_No internet service 1.0 0-17 0-18 0-17 34.968536 184.000
9 0 9 0-9 TechSupport_Yes 1.0 0-19 0-20 0-19 0.766754 65.500
10 0 10 0-10 MultipleLines_Yes 1.0 0-21 0-22 0-21 19.335510 221.875
11 0 11 0-11 PhoneService_Yes 1.0 0-23 0-24 0-23 19.035950 281.125
12 0 12 0-12 Leaf NaN NaN NaN NaN -0.191398 231.500
13 0 13 0-13 PaymentMethod_Electronic check 1.0 0-25 0-26 0-25 43.379410 320.875
14 0 14 0-14 Contract_Two year 1.0 0-27 0-28 0-27 13.401367 140.875
15 0 15 0-15 Leaf NaN NaN NaN NaN 0.050262 94.500
16 0 16 0-16 Leaf NaN NaN NaN NaN -0.052444 55.250
17 0 17 0-17 Leaf NaN NaN NaN NaN -0.058929 111.000
18 0 18 0-18 Leaf NaN NaN NaN NaN -0.148649 73.000
19 0 19 0-19 Leaf NaN NaN NaN NaN 0.161464 63.875
其中叶值存储在列[[Gain]]中(叶节点是在列[[Feature中具有值Leaf的那些节点)”。因此,我可以在Gain列中的相应行上添加噪声,但是随后我不知道如何将Pandas数据帧转换回Booster对象/ XGBoost模型。我应该如何实现这一目标?还是有其他更好的方法来检索和修改XGBoost叶节点的值?
我正在使用xgboost库来训练二进制分类器。我想通过向权重添加噪声(例如...