如何使用包含概率而不是类的单热代码训练 XGBoost？

Question

我正在尝试通过输入训练数据集和训练标签来训练 XGBoost 分类器。标签是一种热编码，而不是发送诸如 [0,1,0,0] 之类的类，我想为特定训练数据点发送诸如 [0,0.6,0.4,0] 之类的概率输入。这样做的原因是因为我想实现用于数据扩充的mixup算法，它将扩充数据输出为新扩充标签的一个热代码的浮点数。

但是，我在 model.fit 上遇到错误，因为它需要单热代码中的标签，而不是每个类的概率。我怎样才能用我的 xgboost 实现数据增强算法？

import xgboost as xgb
import numpy as np

# Generate some random data
X = np.random.rand(100, 16)

# Generate random one-hot encoded target variable
y_one_hot = np.random.randint(0, 4, size=(100,))
y = np.eye(4)[y_one_hot]

# Convert one-hot encoded target variable to probabilities
y_proba = np.zeros((y.shape[0], y.shape[1]))
for i, row in enumerate(y):
    y_proba[i] = row / np.sum(row)

# Define the XGBoost model
model = xgb.XGBClassifier(objective='multi:softprob', num_class=4)

# Train the model
model.fit(X, y_proba)

# Generate some test data
X_test = np.random.rand(10, 16)

# Predict the probabilities for each class
y_pred_proba = model.predict_proba(X_test)

# Get the predicted class for each sample
y_pred = np.argmax(y_pred_proba, axis=1)

Answer 1

想法

可以使用

sample_weight

参数来规避标签编码限制

例子

假设您的训练数据是这些实例 x_i，标签为概率：

x_1  [0, 1, 0, 0]
x_2  [0, 0, 1, 0]
x_3  [0, .6, .4, 0]
x_4  [.7, 0, 0, .3]

将它们转化为这些实例，直接给出标签：

x_1  1
x_2  2
x_3  1
x_3  2
x_4  0
x_4  3

然后，当您将这些转换后的数据插入

fit

方法时，传递参数

sample_weight=[1, 1, .6, .4, .7, .3]

.

一般实施

鉴于您的

model

、

和

y_proba

：

n_samples, n_classes = y_proba.shape
X_upsampled = X.repeat(n_classes, axis=0)
y_direct = np.tile(range(n_classes), n_samples)
sample_weights = y_proba.ravel()

model.fit(X_upsampled, y_direct, sample_weight=sample_weights)

如何使用包含概率而不是类的单热代码训练 XGBoost？

问题描述投票：0回答：1

1个回答

想法

例子

一般实施

最新问题

如何使用包含概率而不是类的单热代码训练 XGBoost？

问题描述 投票：0回答：1

1个回答

想法

例子

一般实施

最新问题

问题描述投票：0回答：1