ANALYSIS -without cross validation, 1000 trees
FUNCTION:
InvestRandomForest2.f <- function(x,Y,NoOfSeeds,mtry.v,nodesize.v,ntree) {
k <- length(mtry.v)
m <- length(nodesize.v)
Table.arr <- array(0,c(2,3,NoOfSeeds,k,m))
for (j in 1:NoOfSeeds) {
set.seed(99+j)
for (i in 1:k) {
for (ind in 1:m) {
library(randomForest)
FitObj <- randomForest(x ,Y,mtry=mtry.v[i],
nodesize=nodesize.v[ind],
ntree=ntree)
Table.arr[,,j,i,ind] <- FitObj$confusion
}
}
}
Table.arr
}
APPLY FUNCTION:
Table.arr <- InvestRandomForest2.f(x,Y,NoOfSeeds=50,
mtry.v=c(1:5),
nodesize.v=c(1:20),
ntree=1000)
save(Table.arr,file="Table.arr")
Table.arr
USING APPLY STATEMENT,PREPARE FOR GRAPHS
Error2.f <- function(mat)
1 - sum(diag(mat[,1:2]))/sum(mat[,1:2])
load("Table.arr")
Error.arr <- apply(Table.arr,c(3:5),Error2.f)
Error.arr #50rows, 5 columns
DO MATPLOT GRAPHICS HERE
FitObj OUTPUT SPECIFIES 500 TREES NOT 1000 ?????????????????????????????
FitObj
Call:
randomForest(x = x, y = Y, type = "prob")
Type of random forest: classification
Number of trees: 500
No. of variables tried at each split: 2
OOB estimate of error rate: 25.8%
Confusion matrix:
0 1 class.error
0 167 33 0.1650000
1 56 89 0.3862069
PLOT ALSO RESULTS IN 500 NOT 1000 TREES ??????????????????????????
plot(FitObj,lwd=2)
abline(h=0.26,lty=3,lwd=2)
legend(x = "topright",
legend = c("without cross-validation", "1000 trees"),
lty = c(1), # Line types
col = c(3,1), # Line colors
lwd = 2)
OUTPUT:
Call:
randomForest(x = x, y = Y, type = "prob")
Type of random forest: classification
Number of trees: 500 should be 1000 ????????????
No. of variables tried at each split: 2
OOB estimate of error rate: 25.8%
Confusion matrix:
0 1 class.error
0 167 33 0.1650000
1 56 89 0.3862069
WHAT IS GOING ON HERE?????????????????????????????????