XGBoost plot_importance不显示功能名称

Question

我正在使用XGBoost和Python，并使用在train()数据上调用的XGBoost DMatrix函数成功训练了一个模型。矩阵是从Pandas数据框创建的，该数据框具有列的特征名称。

Xtrain, Xval, ytrain, yval = train_test_split(df[feature_names], y, \
                                    test_size=0.2, random_state=42)
dtrain = xgb.DMatrix(Xtrain, label=ytrain)

model = xgb.train(xgb_params, dtrain, num_boost_round=60, \
                  early_stopping_rounds=50, maximize=False, verbose_eval=10)

fig, ax = plt.subplots(1,1,figsize=(10,10))
xgb.plot_importance(model, max_num_features=5, ax=ax)

我现在想要使用xgboost.plot_importance()函数查看特征重要性，但结果图不显示特征名称。相反，这些特征列为f1，f2，f3等，如下所示。

我认为问题是我将原来的Pandas数据帧转换为DMatrix。如何正确关联要素名称以使特征重要性图显示它们？

Answer 1

您想在创建feature_names时使用xgb.DMatrix参数

dtrain = xgb.DMatrix(Xtrain, label=ytrain, feature_names=feature_names)

Answer 2

train_test_split将数据帧转换为numpy数组，该数组不再具有列信息。

您可以执行@piRSquared建议的操作，并将这些功能作为参数传递给DMatrix构造函数。或者，您可以将从train_test_split返回的numpy数组转换为Dataframe，然后使用您的代码。

Xtrain, Xval, ytrain, yval = train_test_split(df[feature_names], y, \
                                    test_size=0.2, random_state=42)

# See below two lines
X_train = pd.DataFrame(data=Xtrain, columns=feature_names)
Xval = pd.DataFrame(data=Xval, columns=feature_names)

dtrain = xgb.DMatrix(Xtrain, label=ytrain)

Answer 3

如果您正在使用scikit-learn包装器，则需要访问底层XGBoost Booster并在其上设置功能名称，而不是scikit模型，如下所示：

model = joblib.load("your_saved.model")
model.get_booster().feature_names = ["your", "feature", "name", "list"]
xgboost.plot_importance(model.get_booster())

Answer 4

我在玩feature_names时发现的另一种方式。在玩它时，我写了这个，它适用于我目前正在运行的XGBoost v0.80。

## Saving the model to disk
model.save_model('foo.model')
with open('foo_fnames.txt', 'w') as f:
    f.write('\n'.join(model.feature_names))

## Later, when you want to retrieve the model...
model2 = xgb.Booster({"nthread": nThreads})
model2.load_model("foo.model")

with open("foo_fnames.txt", "r") as f:
    feature_names2 = f.read().split("\n")

model2.feature_names = feature_names2
model2.feature_types = None
fig, ax = plt.subplots(1,1,figsize=(10,10))
xgb.plot_importance(model2, max_num_features = 5, ax=ax)

所以这是单独保存feature_names并在以后添加它。由于某种原因，feature_types也需要初始化，即使值为None。

Answer 5

使用Scikit-Learn Wrapper界面“XGBClassifier”，plot_importance重新设置“matplotlib Axes”类。所以我们可以使用axes.set_yticklabels。

plot_importance(model).set_yticklabels(['feature1','feature2'])

XGBoost plot_importance不显示功能名称

问题描述投票：15回答：4

4个回答

最新问题

XGBoost plot_importance不显示功能名称

问题描述 投票：15回答：4

4个回答

最新问题

问题描述投票：15回答：4