我的代码会产生值错误。
y = df['weather type']
# y is an array with the 11 unique values 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
# and of an 'object' type:
array([10, 10, 10, ..., 10, 10, 9])
# Encode the target variable (weather type) to numeric values -
# not sure if I should have done this step because it seems to have messed up my target labels?
y_le = LabelEncoder()
y = y_le.fit_transform(y)
# the unique values of y_le.classes_ are '0', '1', '10', '11', '12', '2', '3', '5', '6', '7', '8'
# the unique values of y_val are 0, 1, 3, 4, 5, 6, 7, 8, 9, 10
# Initialize the XGBoost classifier
xgb_model = xgb.XGBClassifier(objective='multi:softmax', num_class=len(le.classes_))
# Train the model
xgb_model.fit(X_train, y_train)
# Make predictions on the validation set
y_pred_val = grid_search.predict(X_val)
# Evaluate the model
# Print classification report and confusion matrix
print("\nClassification Report:\n", classification_report(y_val, y_pred_val, target_names=y_le.classes_))
数值误差如下:
--------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[292], line 6
2 y_pred_val = xgb_model.predict(X_val)
4 # Evaluate the model
5 # Print classification report and confusion matrix
----> 6 print("\nClassification Report:\n", classification_report(y_val, y_pred_val, target_names=y_le.classes_))
7 #print("\nClassification Report:\n", classification_report(y_val, y_pred_val, labels=range(len(y_le.classes_)), target_names=y_le.classes_))
File ~\anaconda3\lib\site-packages\sklearn\metrics\_classification.py:2332, in classification_report(y_true, y_pred, labels, target_names, sample_weight, digits, output_dict, zero_division)
2326 warnings.warn(
2327 "labels size, {0}, does not match size of target_names, {1}".format(
2328 len(labels), len(target_names)
2329 )
2330 )
2331 else:
-> 2332 raise ValueError(
2333 "Number of classes, {0}, does not match size of "
2334 "target_names, {1}. Try specifying the labels "
2335 "parameter".format(len(labels), len(target_names))
2336 )
2337 if target_names is None:
2338 target_names = ["%s" % l for l in labels]
ValueError: Number of classes, 10, does not match size of target_names, 11. Try specifying the labels parameter
据我所知,我已经设置了target_names=y_le.classes_。 如何解决这个问题?
另外,我的目标变量,weather_type 是一个“对象”数据类型,我不确定是否应该将其转换为数字以用于 XGBoost 多分类模型?
y_val
。什么是y_val
?
检查
df['weather type']
的数据类型
df['weather type'].dtypes
对于分类问题,sklearn 期望 int 作为目标。如果你的数据类型是int,那么就不需要使用
LabelEncoder()
。 LabelEncoder()
采用对象类型并映射到 int。
'猫' -> 0; “狗”-> 1
如果您的 y 是 object 类型,但所有值似乎都是 int,则可能是因为空值。
df['weather type'].isnull().sum()
如果你有空值,请将其删除并尝试