I was reading caret package and I saw that code;
createDataPartition(y, times = 1, p = 0.5, list = TRUE, groups = min(5,
length(y)))
I am wondering about "times" expression. So, if I use this code,
inTrain2 <- createDataPartition(y = MyData$Class ,times=3, p = .70,list = FALSE)
training2 <- MyData[ inTrain2,] # ≈ %67 (train)
testing2<- MydData[-inTrain2[2],] # ≈ %33 (test)
Could it be cause of overfitting problem? Or is that using for some kind of resampling method (unbiased)?
I would like to mention that, if I use This code;
inTrain2 <- createDataPartition(y = MyData$Class ,times=1, p = .70,list = FALSE)
training2<- MyData[ inTrain2,] #142 samples # ≈ %67 (train)
testing2<- MydData[-inTrain2,] #69 samples # ≈ %33 (test)
I will have got 211 samples and And ≈ %52 Accuracy rate, On the other hand if I use this code;
inTrain2 <- createDataPartition(y = MyData$Class ,times=3,p =.70,list = FALSE)
training2<- MyData[ inTrain2,] # ≈ %67 (train) # 426 samples
testing2<- MydData[-inTrain2[2],] # ≈ %33 (test) # 210 samples
I will have got 536 samples and and ≈ %98 Accuracy rate.
Many thanks in advance.