如何在 XGBoost 上计算 feature_importances_ 仅考虑到 best_iteration 的迭代？

Question

我训练了一个早期停止的 XGBRegressor 模型。据我了解， model.feature_importances_ 计算查看所有历史记录的特征重要性（即还考虑由 Early_stopping_rounds 量化的“耐心”迭代）。尽管如此，我需要在模型上计算的特征重要性仅达到最佳迭代。

这是示例代码：

from xgboost import XGBRegressor
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

# Prepare data
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the model with early stopping
xgb_model = XGBRegressor(n_estimators=1000, early_stopping_rounds=100, eval_metric="rmse")
xgb_model.fit(X_train, y_train, eval_set=[(X_val, y_val)], verbose=False)

xgb_model.feature_importances_

但这给了我考虑完整模型的特征重要性，而不仅仅是最佳迭代：

best_iteration = xgb_model.best_iteration

我无法使用 n_estimators=best_iteration 重新训练新的 XGBRegressor，因为它几乎会使运行时间加倍（此片段是更大代码的一部分）。有没有一种方法可以在不重新训练的情况下实现这一目标？请注意，不幸的是，.feature_importances_ 没有 iteration_range 选项。

Answer 1

您可以使用

xgb_model.get_booster()

使用助推器对象。然后使用

get_score()

方法和

best iteration

来获取最佳模型的特征重要性。最后，将特征分数转换为类似于feature_importances_的格式。

这是您可以使用的示例代码：

from xgboost import XGBRegressor
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

# Prepare data
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the model with early stopping
xgb_model = XGBRegressor(n_estimators=1000, early_stopping_rounds=100, eval_metric="rmse")
xgb_model.fit(X_train, y_train, eval_set=[(X_val, y_val)], verbose=False)

# Get the best iteration
best_iteration = xgb_model.best_iteration

# Access the booster and retrieve feature importances up to best_iteration
booster = xgb_model.get_booster()
importance_dict = booster.get_score(importance_type='weight', iteration=best_iteration)

# Convert the importance dict into a list of feature importances
importances = [importance_dict.get(f'f{i}', 0) for i in range(X.shape[1])]

# Print the importances
print(importances)

如何在 XGBoost 上计算 feature_importances_ 仅考虑到 best_iteration 的迭代？

问题描述投票：0回答：1

1个回答

最新问题

如何在 XGBoost 上计算 feature_importances_ 仅考虑到 best_iteration 的迭代？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1