我有这个R最小的工作环境,使用PimaIndianDiabetes来使用。
#load required library
library(mlbench)
#load Pima Indian Diabetes dataset
data(PimaIndiansDiabetes)
#set seed to ensure reproducible results
set.seed(42)
#split into training and test sets
PimaIndiansDiabetes[,train] <- ifelse(runif(nrow(PimaIndiansDiabetes))
<0.8,1,0)
#separate training and test sets
trainset <- PimaIndiansDiabetes[PimaIndiansDiabetes$train==1,]
testset <- PimaIndiansDiabetes[PimaIndiansDiabetes$train==0,]
#get column index of train flag
trainColNum <- grep(“train”,names(trainset))
#remove train flag column from train and test sets
trainset <- trainset[,-trainColNum]
testset <- testset[,-trainColNum]
#get column index of predicted variable in dataset
typeColNum <- grep(“diabetes”,names(PimaIndiansDiabetes))
我当前的问题是使用IFELSE函数将数据拆分为训练和测试集,并使用R代码中指定的概率。
有一个错误
PimaIndiansDiabetes[,train] <- ifelse(runif(nrow(PimaIndiansDiabetes))
<0.8,1,0)
ifelse正常工作:
ifelse(runif(nrow(PimaIndiansDiabetes))
<0.8,1,0)
但你必须使用一个字符串来分配一个新列('火车'而不是火车)
PimaIndiansDiabetes[,'train'] <- ifelse(runif(nrow(PimaIndiansDiabetes))
<0.8,1,0)
接下来没有用的是选择'trainColNum'你可以这样做
trainColNum <- which(colnames(PimaIndiansDiabetes) == 'train')
或者您使用dplyr包来删除列
library(dplyr)
trainset <- trainset %>% select(-train)
testset <- testset %>% select(-train)
同样适用于糖尿病专栏
typeColNum <- which(colnames(PimaIndiansDiabetes) == 'diabetes')