微调一个已经训练好的 XGBoost 分类模型

Question

我训练了一个 XGBoost 分类模型用于产品评论的情感分析。但是，在某些情况下，模型预测并不符合预期。例如，当我输入评论“交货有点晚，但产品很棒”时，模型将其归类为负面评论 (0)，但我想根据该确切情况对模型进行微调以说出评论是积极的 (1).

有没有办法通过像这样添加特定的数据点来微调已经训练好的 XGBoost 模型？如果不从头开始重新训练整个模型，实现这一目标的最佳方法是什么？

我试过以下功能：

# Fine tune the model
def fine_tune(model, inp, output, word2vec):
    model.fit(
        np.array([word2vec.get_mean_vector(tokenize(
            inp
        ))]), np.array([output])
    )

    return model

但是，当我运行它时，它会在我提供的单个数据点上重新训练整个模型。

任何指导或建议将不胜感激。谢谢！

Answer 1

感谢 @Laassairi Abdellah，他能够为我提供增量培训。有了这些知识，我已经实现了这个功能：

import xgboost as xgb
import numpy as np

def fine_tune(model_, X, y, loop=False, num_boost_rounds=30, params=None):
    """
    Fine-tune an XGBoost model using incremental training.

    Args:
    - model_: str, xgboost.core.Booster, path / object of the model to be fine-tuned.
    - X: array-like, shape (n_samples, n_features), input data for training.
    - y: array-like, shape (n_samples,), output (target) data for training.
    - loop: bool, loop the training process until X predicts y perfectly.
    - num_boost_rounds: int, number of boosting rounds.
    - params: dict, parameters for the model.

    Returns:
    - model: the fine-tuned XGBoost model.
    """
    
    if isinstance(model_, str):
        # Load the existing model
        model = xgb.Booster()
        model.load_model(model_)
    
    elif not isinstance(model_, xgb.Booster):
        try:
            model = model_.get_booster()
        except:
            raise ValueError("The model must be either a string to a file or an XGBoost model.")

    if isinstance(model_, (xgb.Booster, str)):
        assert params is not None, "The params argument must be provided when loading a model from a file or a Booster model."

    param = params if params is not None else model_.get_params()

    # Convert the input to DMatrix
    dX = xgb.DMatrix(X, label=y)

    # Train the model
    model = xgb.train(param, dX, num_boost_rounds, xgb_model=model)

    if loop:
        # Loop the training process until the model predicts perfectly
        while True:
            y_pred = model.predict(dX)
            y_pred = np.where(y_pred > 0.5, 1, 0)

            if np.all(y_pred == y):
                break
            
            model = xgb.train(param, dX, num_boost_rounds, xgb_model=model)

    if not isinstance(model_, (str, xgb.Booster)):
        # Update the internal booster
        model_._Booster = model
    
    return model

此代码的循环部分特定于我的二进制分类用例，因为它是 1 或 0。

用法示例：

fine_tune(model,
    np.array([word2vec.get_mean_vector(tokenize(
        "The delivery was a tiny bit late but the product was sleek and high quality"
    ))]), np.array([1]), loop=True
)

微调一个已经训练好的 XGBoost 分类模型

问题描述投票：0回答：1

1个回答

最新问题

微调一个已经训练好的 XGBoost 分类模型

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1