我已经在二项式分类任务上训练了 BART 模型。预测因子中有一个具有 18 个级别的因子。它们如下:
levels(poke$Type.1)
[1] "Bug" "Dark" "Dragon" "Electric" "Fairy" "Fighting"
[7] "Fire" "Flying" "Ghost" "Grass" "Ground" "Ice"
[13] "Normal" "Poison" "Psychic" "Rock" "Steel" "Water"
当我读取 BART 对象,特别是
varcount.mean
(数组)来确定变量重要性时,我得到以下难以解释的因子名称(注意类型名称):
bart1$varcount.mean
Type.11 Type.12 Type.13 Type.14 Type.15 Type.16
1.930 1.825 1.782 1.804 1.983 1.864
Type.17 Type.18 Type.19 Type.110 Type.111 Type.112
1.913 0.000 1.950 1.977 1.983 1.987
Type.113 Type.114 Type.115 Type.116 Type.117 Type.118
2.105 2.004 1.871 2.102 1.906 2.342
Total HP Attack Defense Sp..Atk Sp..Def
4.400 2.266 2.415 2.175 2.508 2.652
Speed Generation
2.711 2.281
我的问题是 - 如果不手动重命名每一行,您将如何解决这个问题?是否有一个参数可以传递给
bart
,或者一个方便的函数可以用来重命名 bart 对象中的输出行?
这是我写的程序:
library(BART)
library(tidyr)
library(dplyr)
library(ggplot2)
library(ROCR)
pokeB<-read.csv("~/Downloads/Pokemon.csv", header=T)
Legend<-vector(length=800) %>% rep(0, 800)
pokeB<-data.frame(pokeB, Legend)
pokeB$Legend<-as.integer(
ifelse(pokeB$Legendary=="True","1","0")
)
poke<-pokeB %>% select(Type.1,Total,HP,Attack,Defense,
Sp..Atk,Sp..Def,Speed,Generation,
Legend)
poke$Type.1<-as.factor(poke$Type.1)
set.seed(1)
train<-sample(1:nrow(poke), nrow(poke)/2)
x<-poke %>% select(-Legend)
y<-poke[,"Legend"]
xtrain<-x[train,]
ytrain<-y[train]
xtest<-x[-train,]
ytest<-y[-train]
bart1<-mc.gbart(xtrain, ytrain, x.test=xtest, type='pbart',
mc.cores=4)
ord1<-order(bart1$varcount.mean,decreasing=T)
vars1<-as.data.frame(bart1$varcount.mean[ord1])
pred1<-ifelse(bart1$prob.test.mean>0.5, 1, 0)
tab1<-table(ytest, as.factor(pred1))
要查看 bart 对象中输出的混乱因子名称,只需查看
vars1
。我的目标是让 bart 对象的输出保留原始因子级别名称,或者编辑 vars1
数据框以方便的方式恢复原始级别名称(即不手动逐行)。
感谢您的阅读
改变
poke$Type.1<-as.factor(poke$Type.1)
至
poke$Type.1<-as.character(poke$Type.1)