Help using caret to run 5-fold CV on LDA,QDA and KNN while also creating a confusion matrix

eyov7 · March 9, 2020, 11:04pm

This first part basically shows what I did to run a LOGIT model and construct a confusion matrix. I was going to provide an image of my results, however new users are limited to 1 photo. In summary I had 26 true positives,11 false negatives,2 false positives and 6 true negatives.

library(caret)
library(MASS)
library(pROC)
# create training and testing data
set.seed(123)
trainIndex = createDataPartition(BCGsub$Status, p = .8, 
                                  list = FALSE, 
                                  times = 1)
BCGsubTrain = BCGsub[ trainIndex,]
BCGsubTest  = BCGsub[-trainIndex,]
##summary(BCGsubTest)

# Fit LOGIT model and choose best model using stepAIC
glm.model=glm(Status~.,data= BCGsubTrain,family="binomial")
summary(glm.model)
glm.fit= stepAIC(glm.model, direction = 'backward')
glm.fit

# Make predictions on test data and construct a confusion matrix
probs.glm = predict(glm.fit,newdata = BCGsubTest ,type = "response")
#range(probs)
pred.glm = rep("NoRelapse", length(probs.glm))
pred.glm[probs > 0.5] = "Relapse"
table(pred.glm,BCGsubTest$Status)

Using what I have done I tried to implement my work into the next question. The next question asked to first normalize the training and testing data. Apply a 5-fold cross validation method on the training set using trainControl() with “CV” method in the caret package.Perform lda, qda and knn classification methods using train() in caret package And to make the predictions on the test data set, While also creating the confusion matrix.

I got reference material online to normalize my data from
https://machinelearningmastery.com/pre-process-your-dataset-in-r/

I need help debugging the code.

If I run the predict() using "raw" I get the same exact results from my LOGIT confusion matrix for all three models, which is very fishy and if I run "probs" I don't get the right number of predictions.

set.seed(123)
trainIndex = createDataPartition(BCGsub$Status, p = .8, 
                                  list = FALSE, 
                                  times = 1)
BCGsubTrain = BCGsub[ trainIndex,]
BCGsubTest  = BCGsub[-trainIndex,]

# normalize the training and testing data

## calculate the pre-process parameters from the dataset
preprocessParams1 = preProcess(BCGsubTrain, method=c("range"))
preprocessParams2 = preProcess(BCGsubTest, method=c("range"))

## transform the dataset using the parameters
BCGsubTrainNorm = predict(preprocessParams1, BCGsubTrain)
BCGsubTestNorm = predict(preprocessParams2, BCGsubTest)

## summarize the transformed dataset
###summary(BCGsubTrainNorm)
###summary(BCGsubTestNorm)

# run a 5-fold Cross-Validation
set.seed(123) 
train.control = trainControl(method = "cv", number = 5)

# perform LDA, QDA,KNN
lda.model = train(Status~.,data= BCGsubTrainNorm, method = "lda",
               trControl = train.control)

qda.model = train(Status~.,data= BCGsubTrainNorm, method = "qda",
               trControl = train.control)

knn.model = train(Status~.,data= BCGsubTrainNorm, method = "knn",
               trControl = train.control)

# Make predictions on test data and construct a confusion matrix

probs.lda = predict(lda.model,newdata = BCGsubTestNorm ,type = "response")
#range(probs.lda)
pred.lda = rep("NoRelapse", length(probs.lda))
pred.lda[probs > 0.5] = "Relapse"
table(pred.lda,BCGsubTestNorm$Status) # hw more correct

probs.qda = predict(qda.model,newdata = BCGsubTestNorm ,type = "response")
#range(probs.qda)
pred.qda = rep("NoRelapse", length(probs.qda))
pred.qda[probs > 0.5] = "Relapse"
table(pred.qda,BCGsubTestNorm$Status) # hw more correct

probs.knn = predict(knn.model,newdata = BCGsubTestNorm ,type = "response")
#range(probs.knn)
pred.knn = rep("NoRelapse", length(probs.knn))
pred.knn[probs > 0.5] = "Relapse"
table(pred.knn,BCGsubTestNorm$Status) # hw more correct

Output:

Max · March 17, 2020, 3:56pm

The class of objects generated by train are "train". When you predict with them, you are using predict.train() and not the predict() from the original package. predict.train() does not have type = "response"; we have standardized on type = "prob".

system · April 7, 2020, 3:56pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.