Probabilities calibration in caret

jpatteet · July 14, 2022, 12:06pm

Hello,

I am using glmnet to classify data of two differents classes with different predictors.
I used here the iris dataset so that you can try it yourself. My dataset structure is merely the same.

For that I am deviding my data set into a training set and a test using CreatDataPartition function. I do some machine learning with the caret package using the function train and the glmnet method.
Based on that I predict probabilities of assigning each observation to one of the classes.
I tried calibrating those probabilities because they are not naturally.
Unfortunately the final plot shows that the calibation is not a success.
The Calprob curve should follow the diagonal but it doe not.

my_data  <- iris %>% #reducing the data to have two classes only
  dplyr::filter((Species =="virginica" | Species == "versicolor") ) %>% dplyr::select(Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species)

my_data <- droplevels(my_data)

index <- createDataPartition(y=my_data$Species,p=0.6,list=FALSE) 
#creating train and test set for machine learning
Train <- my_data[index,]
Test <-  my_data[-index,]

#machine learning based on Train data partition with glmnet method
classCtrl <- trainControl(method = "repeatedcv", number=10,repeats=5,classProbs =  TRUE,savePredictions = "final")
set.seed(355)
glmnet_ML <- train(Species~., Train, method= "glmnet",  trControl=classCtrl)
glmnet_ML

#probabilities to assign each row of data to one class or the other on Test
predTestprob <- predict(glmnet_ML,Test,type="prob")
pred 


#trying out calibration following "Applied predictive modeling" book from Max Kuhn p266-273
predTrainprob <- predict(glmnet_ML,Train,type="prob")
predTest <- predict(glmnet_ML,Test)
predTestprob <- predict(glmnet_ML,test,type="prob")

Test$PredProb <- predTestprob[,"versicolor"]
Test$Pred <- predTest
Train$PredProb <- predTrainprob[,"versicolor"]

#logistic regression to calibrate
sigmoidalCal <- glm(relevel(Species, ref= "virginica") ~ PredProb,data = Train,family = binomial)
coef(summary(sigmoidalCal))

#predicting calibrated scores
sigmoidProbs <- predict(sigmoidalCal,newdata = Test[,"PredProb", drop = FALSE],type = "response")
Test$CalProb <- sigmoidProbs

#plotting to see if it works
calCurve2 <- calibration(Species ~ PredProb +  CalProb, data = Test)
xyplot(calCurve2,auto.key = list(columns = 2))

Has anyone tried something like that before or knows any other ways ?

system · August 4, 2022, 12:07pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.