ValueError:类别数量与混淆矩阵的 target_names 大小不匹配

问题描述 投票:0回答:1

我的代码会产生值错误。

y = df['weather type'] 
# y is an array with the 11 unique values 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
# and of an 'object' type:
array([10, 10, 10, ..., 10, 10,  9])

# Encode the target variable (weather type) to numeric values - 
# not sure if I should have done this step because it seems to have messed up my target labels?
y_le = LabelEncoder()
y = y_le.fit_transform(y)

# the unique values of y_le.classes_ are '0', '1', '10', '11', '12', '2', '3', '5', '6', '7', '8'
# the unique values of y_val are 0, 1, 3, 4, 5, 6, 7, 8, 9, 10

# Initialize the XGBoost classifier
xgb_model = xgb.XGBClassifier(objective='multi:softmax', num_class=len(le.classes_))

# Train the model
xgb_model.fit(X_train, y_train)

# Make predictions on the validation set 
y_pred_val = grid_search.predict(X_val)

# Evaluate the model
# Print classification report and confusion matrix
print("\nClassification Report:\n", classification_report(y_val, y_pred_val, target_names=y_le.classes_))

数值误差如下:

--------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[292], line 6
      2 y_pred_val = xgb_model.predict(X_val)
      4 # Evaluate the model
      5 # Print classification report and confusion matrix
----> 6 print("\nClassification Report:\n", classification_report(y_val, y_pred_val, target_names=y_le.classes_))
      7 #print("\nClassification Report:\n", classification_report(y_val, y_pred_val, labels=range(len(y_le.classes_)), target_names=y_le.classes_))

File ~\anaconda3\lib\site-packages\sklearn\metrics\_classification.py:2332, in classification_report(y_true, y_pred, labels, target_names, sample_weight, digits, output_dict, zero_division)
   2326         warnings.warn(
   2327             "labels size, {0}, does not match size of target_names, {1}".format(
   2328                 len(labels), len(target_names)
   2329             )
   2330         )
   2331     else:
-> 2332         raise ValueError(
   2333             "Number of classes, {0}, does not match size of "
   2334             "target_names, {1}. Try specifying the labels "
   2335             "parameter".format(len(labels), len(target_names))
   2336         )
   2337 if target_names is None:
   2338     target_names = ["%s" % l for l in labels]

ValueError: Number of classes, 10, does not match size of target_names, 11. Try specifying the labels parameter

据我所知,我已经设置了target_names=y_le.classes_。 如何解决这个问题?

另外,我的目标变量,weather_type 是一个“对象”数据类型,我不确定是否应该将其转换为数字以用于 XGBoost 多分类模型?

python xgboost confusion-matrix
1个回答
0
投票
代码片段中未分配

y_val
。什么是
y_val

检查

df['weather type']

的数据类型
df['weather type'].dtypes

对于分类问题,sklearn 期望 int 作为目标。如果你的数据类型是int,那么就不需要使用

LabelEncoder()
LabelEncoder()
采用对象类型并映射到 int。

'猫' -> 0; “狗”-> 1

如果您的 y 是 object 类型,但所有值似乎都是 int,则可能是因为空值。

df['weather type'].isnull().sum()

如果你有空值,请将其删除并尝试

© www.soinside.com 2019 - 2024. All rights reserved.