为什么 ConfusionMatrix 不采用这些数据集? (两者都是二进制)
str(logitr)
num [1:384] 0 0 0 0 0 0 0 1 0 0 ...
str(actual)
num [1:384] 0 0 1 1 1 0 0 1 0 0 ...
logit_cm <- confusionMatrix(data=logitr, reference=actual)'
Error: `data` and `reference` should be factors with the same levels.
这里是如何使用
caret
使用数据创建混淆矩阵的示例。请注意,您的结果必须是 factor
。在您的数据中,它是 numeric
。
首先准备数据并训练一个极其简单的模型:
library(caret)
# Make factor
training_data$Outcome_fct <- factor(training_data$Outcome)
# Train simple model
default_glm_mod = train(
form = Outcome_fct ~ Glucose,
data = training_data,
trControl = trainControl(method = "none"),
method = "glm",
family = "binomial"
)
然后创建混淆矩阵。哪个类是正类对于准确性并不重要,但如果您要使用精度、召回率和 F1 等指标,则需要指定正类。
caret::confusionMatrix(
data = predict(default_glm_mod),
reference = training_data$Outcome_fct,
positive = "1" # set positive class
)
输出:
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 199 74
1 40 71
Accuracy : 0.7031
95% CI : (0.6547, 0.7484)
No Information Rate : 0.6224
P-Value [Acc > NIR] : 0.0005585
Kappa : 0.3379
Mcnemar's Test P-Value : 0.0019966
Sensitivity : 0.8326
Specificity : 0.4897
Pos Pred Value : 0.7289
Neg Pred Value : 0.6396
Prevalence : 0.6224
Detection Rate : 0.5182
Detection Prevalence : 0.7109
Balanced Accuracy : 0.6611
'Positive' Class : 0