R 中 PLSDA 的混淆矩阵:“错误:‘数据’和‘参考’应该是具有相同水平的因子。”

问题描述 投票:0回答:1

我正在尝试使用 mdatools 包在 R 中运行偏最小二乘判别分析 (PLSDA),然后运行预测和混淆矩阵。 mdatools::plsda 函数正常工作,stats::predict 也是如此,但我在使用 caret::confusionMatrix 时遇到了困难。

这是一个可重现的示例:

library(tidyverse)
library(mdatools)
library(caret)

#GetData

data(iris)

#Sampling 50/50 for train and valid

set.seed(2000)
Train <- iris %>% group_by(Species) %>% sample_frac(.5, replace = FALSE)
Valid <- anti_join(iris, Train)

#Separating categorical variable from numerical variables

PLSDA_Response = as.matrix(Train[,5])
PLSDA_Predictors = as.matrix(Train[,c(1:4)])

Predict_Response = as.matrix(Valid[,5])
Predict_Predictors = as.matrix(Valid[,c(1:4)])

#Run PLS-DA

pls_comp <- mdatools::plsda(PLSDA_Predictors, as.factor(PLSDA_Response), ncomp = 1)

#Run the prediction

pls_comp_predict <- stats::predict(pls_comp, Predict_Predictors)

#Run the Confusion Matrix

pls_CM <- caret::confusionMatrix(pls_comp_predict, as.factor(Predict_Response))

我收到的消息是:错误:

data
reference
应该是具有相同水平的因子。

有人可以帮助我吗?谢谢! :)

r r-caret predict confusion-matrix pls
1个回答
0
投票
# Run PLS-DA
pls_comp <- mdatools::plsda(PLSDA_Predictors, as.factor(PLSDA_Response), ncomp = 1)

# Run the prediction
pls_comp_predict <- mdatools:::predict.pls(pls_comp, x=Predict_Predictors)
y_pred <- pls_comp_predict$y.pred[,1,1:3]

# Distribution of the predicted outcome across the three classes 
# (virginica, versicolor, setosa) and two reasonable cutoffs
boxplot(y_pred[,1]~as.factor(Predict_Response))
abline(a=0.25, b=0, col="red", lty=2)
abline(a=-0.9, b=0, col="red", lty=2)

enter image description here

# Run the Confusion Matrix
class_pred <- cut(y_pred[,1], breaks=c(-2,-.9,0,2), labels=c("virginica","versicolor","setosa"))
class_ref  <- factor(Predict_Response, levels=c("virginica","versicolor","setosa"))

pls_CM <- caret::confusionMatrix(class_pred, class_ref)
print(pls_CM)

这是获得的混淆矩阵:

Confusion Matrix and Statistics

            Reference
Prediction   virginica versicolor setosa
  virginica         21          1      0
  versicolor         3         24      0
  setosa             0          0     25

Overall Statistics
                                          
               Accuracy : 0.9459          
                 95% CI : (0.8673, 0.9851)
    No Information Rate : 0.3378          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.9189          
                                          
 Mcnemar's Test P-Value : NA              

Statistics by Class:

                     Class: virginica Class: versicolor Class: setosa
Sensitivity                    0.8750            0.9600        1.0000
Specificity                    0.9800            0.9388        1.0000
Pos Pred Value                 0.9545            0.8889        1.0000
Neg Pred Value                 0.9423            0.9787        1.0000
Prevalence                     0.3243            0.3378        0.3378
Detection Rate                 0.2838            0.3243        0.3378
Detection Prevalence           0.2973            0.3649        0.3378
Balanced Accuracy              0.9275            0.9494        1.0000
© www.soinside.com 2019 - 2024. All rights reserved.