Hello,
I am using glmnet to classify data of two differents classes with different predictors.
I used here the iris dataset so that you can try it yourself. My dataset structure is merely the same.
For that I am deviding my data set into a training set and a test using CreatDataPartition function. I do some machine learning with the caret package using the function train and the glmnet method.
Based on that I predict probabilities of assigning each observation to one of the classes.
I tried calibrating those probabilities because they are not naturally.
Unfortunately the final plot shows that the calibation is not a success.
The Calprob curve should follow the diagonal but it doe not.
my_data <- iris %>% #reducing the data to have two classes only
dplyr::filter((Species =="virginica" | Species == "versicolor") ) %>% dplyr::select(Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species)
my_data <- droplevels(my_data)
index <- createDataPartition(y=my_data$Species,p=0.6,list=FALSE)
#creating train and test set for machine learning
Train <- my_data[index,]
Test <- my_data[-index,]
#machine learning based on Train data partition with glmnet method
classCtrl <- trainControl(method = "repeatedcv", number=10,repeats=5,classProbs = TRUE,savePredictions = "final")
set.seed(355)
glmnet_ML <- train(Species~., Train, method= "glmnet", trControl=classCtrl)
glmnet_ML
#probabilities to assign each row of data to one class or the other on Test
predTestprob <- predict(glmnet_ML,Test,type="prob")
pred
#trying out calibration following "Applied predictive modeling" book from Max Kuhn p266-273
predTrainprob <- predict(glmnet_ML,Train,type="prob")
predTest <- predict(glmnet_ML,Test)
predTestprob <- predict(glmnet_ML,test,type="prob")
Test$PredProb <- predTestprob[,"versicolor"]
Test$Pred <- predTest
Train$PredProb <- predTrainprob[,"versicolor"]
#logistic regression to calibrate
sigmoidalCal <- glm(relevel(Species, ref= "virginica") ~ PredProb,data = Train,family = binomial)
coef(summary(sigmoidalCal))
#predicting calibrated scores
sigmoidProbs <- predict(sigmoidalCal,newdata = Test[,"PredProb", drop = FALSE],type = "response")
Test$CalProb <- sigmoidProbs
#plotting to see if it works
calCurve2 <- calibration(Species ~ PredProb + CalProb, data = Test)
xyplot(calCurve2,auto.key = list(columns = 2))
Has anyone tried something like that before or knows any other ways ?