从XGBoost模型获取实际特征名称

Question

我知道这个问题已经问过几次了，我已经阅读了，但仍然无法弄清楚。和其他人一样，我的功能名称最后显示为f56，f234，f12等，我想使用实际名称代替f-somethings！这是与模型相关的代码的一部分：

optimized_params, xgb_model = find_best_parameters() #where fitting and GridSearchCV happens
xgdmat = xgb.DMatrix(X_train_scaled, y_train_scaled)
feature_names=xgdmat.feature_names
final_gb = xgb.train(optimized_params, xgdmat, num_boost_round = 
                     find_optimal_num_trees(optimized_params,xgdmat)) 


final_gb.get_fscore()
mapper = {'f{0}'.format(i): v for i, v in enumerate(xgdmat.feature_names)}
mapped = {mapper[k]: v for k, v in final_gb.get_fscore().items()}
mapped
xgb.plot_importance(mapped, color='red')

我也尝试过：

feature_important = final_gb.get_score(importance_type='weight')
keys = list(feature_important.keys())
values = list(feature_important.values())

data = pd.DataFrame(data=values, index=keys, columns=["score"]).sort_values(by = "score", ascending=False)
data.plot(kind='barh')

但特征仍显示为f +数字。非常感谢您的帮助。

[我现在正在做的是在fs的末尾获取数字，例如从f234获得234，并在X_train.columns [234]中使用它来查看实际名称。但是，我有第二个想法，因为以此方式获得的名称是f234代表的实际功能。

Answer 1

首先根据您的原始特征制作字典，然后将其映射回特征名称。

# create dict to use later
myfeatures = X_train_scaled.columns
dict_features = dict(enumerate(myfeatures))

# feat importance with names f1,f2,...
axsub = xgb.plot_importance(final_gb )

# get the original names back
Text_yticklabels = list(axsub.get_yticklabels())
dict_features = dict(enumerate(myfeatures))
lst_yticklabels = [ Text_yticklabels[i].get_text().lstrip('f') for i in range(len(Text_yticklabels))]
lst_yticklabels = [ dict_features[int(i)] for i in lst_yticklabels]

axsub.set_yticklabels(lst_yticklabels)
print(dict_features)
plt.show()

这里是示例，它是如何工作的：

Answer 2

创建您的xgb.DMatrix时，可以通过使用feature_names参数来使用此问题。

xgdmat = xgb.DMatrix（X_train_scaled，y_train_scaled，feature_names = feature_names）

从XGBoost模型获取实际特征名称

问题描述投票：0回答：2

2个回答

最新问题

从XGBoost模型获取实际特征名称

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2