我正在尝试使用 mdatools 包在 R 中运行偏最小二乘判别分析 (PLSDA),然后运行预测和混淆矩阵。 mdatools::plsda 函数正常工作,stats::predict 也是如此,但我在使用 caret::confusionMatrix 时遇到了困难。
这是一个可重现的示例:
library(tidyverse)
library(mdatools)
library(caret)
#GetData
data(iris)
#Sampling 50/50 for train and valid
set.seed(2000)
Train <- iris %>% group_by(Species) %>% sample_frac(.5, replace = FALSE)
Valid <- anti_join(iris, Train)
#Separating categorical variable from numerical variables
PLSDA_Response = as.matrix(Train[,5])
PLSDA_Predictors = as.matrix(Train[,c(1:4)])
Predict_Response = as.matrix(Valid[,5])
Predict_Predictors = as.matrix(Valid[,c(1:4)])
#Run PLS-DA
pls_comp <- mdatools::plsda(PLSDA_Predictors, as.factor(PLSDA_Response), ncomp = 1)
#Run the prediction
pls_comp_predict <- stats::predict(pls_comp, Predict_Predictors)
#Run the Confusion Matrix
pls_CM <- caret::confusionMatrix(pls_comp_predict, as.factor(Predict_Response))
我收到的消息是:错误:
data
和reference
应该是具有相同水平的因子。
有人可以帮助我吗?谢谢! :)
# Run PLS-DA
pls_comp <- mdatools::plsda(PLSDA_Predictors, as.factor(PLSDA_Response), ncomp = 1)
# Run the prediction
pls_comp_predict <- mdatools:::predict.pls(pls_comp, x=Predict_Predictors)
y_pred <- pls_comp_predict$y.pred[,1,1:3]
# Distribution of the predicted outcome across the three classes
# (virginica, versicolor, setosa) and two reasonable cutoffs
boxplot(y_pred[,1]~as.factor(Predict_Response))
abline(a=0.25, b=0, col="red", lty=2)
abline(a=-0.9, b=0, col="red", lty=2)
# Run the Confusion Matrix
class_pred <- cut(y_pred[,1], breaks=c(-2,-.9,0,2), labels=c("virginica","versicolor","setosa"))
class_ref <- factor(Predict_Response, levels=c("virginica","versicolor","setosa"))
pls_CM <- caret::confusionMatrix(class_pred, class_ref)
print(pls_CM)
这是获得的混淆矩阵:
Confusion Matrix and Statistics
Reference
Prediction virginica versicolor setosa
virginica 21 1 0
versicolor 3 24 0
setosa 0 0 25
Overall Statistics
Accuracy : 0.9459
95% CI : (0.8673, 0.9851)
No Information Rate : 0.3378
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.9189
Mcnemar's Test P-Value : NA
Statistics by Class:
Class: virginica Class: versicolor Class: setosa
Sensitivity 0.8750 0.9600 1.0000
Specificity 0.9800 0.9388 1.0000
Pos Pred Value 0.9545 0.8889 1.0000
Neg Pred Value 0.9423 0.9787 1.0000
Prevalence 0.3243 0.3378 0.3378
Detection Rate 0.2838 0.3243 0.3378
Detection Prevalence 0.2973 0.3649 0.3378
Balanced Accuracy 0.9275 0.9494 1.0000