考虑以下可复制示例。注意不平衡的目标变量。
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split, RepeatedStratifiedKFold
# generate 2 class dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=42, weights=[.95])
# split into train/test sets
trainX, testX, trainy, testy = train_test_split(X, y, test_size=0.2, random_state=2)
def evaluate_model(X, y, model):
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=42)
scores = cross_val_score(model, X, y, scoring='roc_auc', cv=cv, n_jobs=-1)
return scores
model = LogisticRegression(solver='liblinear')
scores = evaluate_model(X=trainX, y=trainy, model=model)
scores
我不相信评分者正在测量精度召回曲线的AUC。一个人如何实现此得分手进行交叉验证?
“平均精度”是您可能想要的,在PR曲线下测量了一个非交互面积。请参阅用户指南的示例
和部分的最后几段。