我有一个逻辑回归模型来预测二进制输出(0 或 1)。我想了解0级的P/R,并生成相应的曲线。我使用这个代码:
clf = linear_model.LogisticRegression().fit(ohe_X_train, y_train)
# Predict labels on the one hot encoded test set
clf_predictions = clf.predict(ohe_X_test)
y_scores = clf.predict_proba(ohe_X_test)
class_of_interest = 0
# P/R based on precision_recall_fscore_support
precision, recall, fscore, support = precision_recall_fscore_support(y_test, clf_predictions, labels=[class_of_interest])
# P/R curve using precision_recall_curve
y_scores = clf.predict_proba(ohe_X_test)[:, class_of_interest]
precision_curve, recall_curve, thresholds = precision_recall_curve(y_test, y_scores)
plt.plot(recall_curve, precision_curve, marker='.')
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision-Recall Curve for Class 0')
plt.grid(True)
plt.show()
在这种情况下,PR 曲线随着召回率的增加而提高了精确度。我做错了什么?当 class_of_interest = 1 时它完美工作。
而不是
precision_curve, recall_curve, thresholds = precision_recall_curve(y_test, y_scores)
应该是
precision_curve, recall_curve, thresholds = precision_recall_curve(1-y_test, y_scores)