我在 xgboost 分类器中遇到了奇怪的行为。复制对this帖子
的回复中的代码import xgboost as xgb
import numpy as np
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
X, y = make_moons(noise=0.3, random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1)
xgb_clf = xgb.XGBClassifier()
xgb_clf = xgb_clf.fit(X_train, y_train)
print(xgb_clf.predict(X_test))
print(xgb_clf.predict_proba(X_test))
>>[0 0 1 0 0 1 0 0 1 1]
[[0.97378635 0.02621362]
[0.97106457 0.0289354 ]
[0.45146966 0.54853034]
[0.9181994 0.08180059]
[0.97378635 0.02621362]
[0.4264453 0.5735547 ]
[0.6279408 0.37205923]
[0.991474 0.00852604]
[0.06204838 0.9379516 ]
[0.08833408 0.9116659 ]]
到目前为止一切顺利。然而,当输入包含所有 nan 值时,即使模型也会做出预测。
b = np.empty([3,2])
b[:] = np.nan
xgb_clf.predict_proba(b)
>>array([[0.8939177 , 0.10608231],
[0.8939177 , 0.10608231],
[0.8939177 , 0.10608231]], dtype=float32)
这让我完全措手不及。我是否缺少一些参数,这可以使分类器预测输出也为nan
这是预期的行为。如果您想覆盖模型的预测,则必须手动执行此操作。