如何在xgboost中获得功能重要性？

Question

我正在使用xgboost构建模型，并尝试使用get_fscore()找到每个功能的重要性，但它返回{}

我的火车代码是：

dtrain = xgb.DMatrix(X, label=Y)
watchlist = [(dtrain, 'train')]
param = {'max_depth': 6, 'learning_rate': 0.03}
num_round = 200
bst = xgb.train(param, dtrain, num_round, watchlist)

我的火车有没有错？如何在xgboost中获得功能重要性？

Answer 1

在您的代码中，您可以获得dict格式的每个功能的功能重要性：

bst.get_score(importance_type='gain')

>>{'ftr_col1': 77.21064539577829,
   'ftr_col2': 10.28690566363971,
   'ftr_col3': 24.225014841466294,
   'ftr_col4': 11.234086283060112}

说明：train（）API的方法get_score（）定义为：

get_score（fmap =''，importance_type ='weight'）

fmap（str（optional）） - 要素图文件的名称。
importance_type 'weight' - 用于在所有树中拆分数据的功能的次数。 'gain' - 使用该功能的所有分割的平均增益。 'cover' - 使用该功能的所有分割的平均覆盖率。 'total_gain' - 使用该功能的所有分割的总增益。 'total_cover' - 使用该功能的所有拆分的总覆盖范围。

https://xgboost.readthedocs.io/en/latest/python/python_api.html

Answer 2

试试这个

fscore = clf.best_estimator_.booster().get_fscore()

Answer 3

我当然不知道如何获得价值，但有一个很好的方法来绘制特征重要性：

model = xgb.train(params, d_train, 1000, watchlist)
fig, ax = plt.subplots(figsize=(12,18))
xgb.plot_importance(model, max_num_features=50, height=0.8, ax=ax)
plt.show()

Answer 4

使用sklearn API和XGBoost> = 0.81：

clf.get_booster().get_score(importance_type="gain")

要么

regr.get_booster().get_score(importance_type="gain")

为了正常工作，当你调用regr.fit（或clf.fit）时，X必须是pandas.DataFrame。

Answer 5

对于功能重要性试试这个：

分类：

pd.DataFrame(bst.get_fscore().items(), columns=['feature','importance']).sort_values('importance', ascending=False)

回归：

xgb.plot_importance(bst)

Answer 6

首先从XGboost构建模型

from xgboost import XGBClassifier, plot_importance
model = XGBClassifier()
model.fit(train, label)

这会产生一个数组。所以我们可以用降序对它进行排序

sorted_idx = np.argsort(model.feature_importances_)[::-1]

然后，是时候将所有排序的重要性和列的名称一起打印为列表（我假设数据加载了Pandas）

for index in sorted_idx:
    print([train.columns[index], model.feature_importances_[index]])

此外，我们可以使用XGboost内置函数绘制重要性

plot_importance(model, max_num_features = 15)
pyplot.show()

如果你愿意，可以在max_num_features中使用plot_importance来限制功能的数量。

Answer 7

对于在使用xgb.XGBRegressor()时遇到此问题的任何人，我使用的解决方法是将数据保存在pandas.DataFrame()或numpy.array()中，而不是将数据转换为dmatrix()。另外，我必须确保没有为XGBRegressor指定gamma参数。

fit = alg.fit(dtrain[ft_cols].values, dtrain['y'].values)
ft_weights = pd.DataFrame(fit.feature_importances_, columns=['weights'], index=ft_cols)

在拟合回归量之后，fit.feature_importances_返回一个权重数组，我假设它与pandas数据帧的特征列的顺序相同。

我目前的设置是Ubuntu 16.04，Anaconda发行版，python 3.6，xgboost 0.6和sklearn 18.1。

Answer 8

获取包含分数和功能名称的表格，然后绘制它。

feature_important = model.get_score(importance_type='weight')
keys = list(feature_important.keys())
values = list(feature_important.values())

data = pd.DataFrame(data=values, index=keys, columns=["score"]).sort_values(by = "score", ascending=False)
data.plot(kind='barh')

例如：

如何在xgboost中获得功能重要性？

问题描述投票：8回答：8

8个回答

最新问题

如何在xgboost中获得功能重要性？

问题描述 投票：8回答：8

8个回答

最新问题

问题描述投票：8回答：8