Why does Tidymodels takes significantly longer for Crossvalidation and hyperparametertuning than glmnet?

MatthiasHerp · July 28, 2022, 12:44pm

Hello,
the goal of my project is to implement multiple ML algorithms, one of them being an elastic net logistic regression. In order to make my code as simple as possible (run for different specified models), I want to use tidymodels.

For the elastic net logistic regression I set a fixed mixture of 0.7 and want to find the optimal penalty lambda. As its estimate can be unstable I do the CV multiple times (50). This goes quite fast glmnet, however it takes much longer using tidy models.

I suspect it has to do with the way the grid search goes, in tidymodels they try all penalities while glmnet uses a targeted search, see here from cv.glmnet helper:
"glmnet chooses its own sequence. Note that this is done for the full model (master sequence), and separately for each fold. The fits are then alligned using the master sequence (see the allignment argument for additional details). Adapting lambda for each fold leads to better convergence."

Does anybody have an idea how to make tidymodels use the cv path of cv.glmnet? Or am I missing something. This might also result in a feature request for the package.

Sidenote: Both approaches result in similar hyperparameters and thus predictions. So its not that different models are fit.

Here is a reproducible example on a dummy dataset with multiple different tuning grids, to show what I mean:

library(glmnet)
library(tidymodels)

data(BinomialExample)

## glmnet
set.seed(42)

X_batch <- BinomialExample$x |>
  as.matrix()
y_batch <- BinomialExample$y

tictoc::tic("glmnet.cv:")
MSE <- NULL
for (j in 1:50){
  cv_fit <- glmnet::cv.glmnet(X_batch, y_batch, family=c("binomial"), alpha = 0.7, type.measure = "mse", nfolds = 10)
  MSE <- cbind(MSE, cv_fit$cvm)
}
tictoc::toc() # 7.633 sec elapsed

## Tidymodels
set.seed(42)

data_train <- data.frame(cbind(BinomialExample$x,BinomialExample$y))
colnames(data_train) <- c(seq(1,30),"y")
data_train$y <- factor(data_train$y)

cv_splits <- vfold_cv(data_train, v=10, repeats = 10, strata = "y")

mod <- logistic_reg(
  mode = "classification",
  engine = "glmnet",
  penalty = tune(),
  mixture = 0.7)

rec <- recipe(y~ ., data = data_train) |> 
  step_normalize(all_numeric())

wfl <- workflow() %>%
  add_recipe(rec) %>%
  add_model(mod)

grid1 <- grid_regular(penalty(), levels = 50)

grid2 <- grid_regular(penalty(range = c(-5,1), trans = log10_trans()), levels = 50)

grid3 <- grid_regular(penalty())

grid4 <- grid_regular(penalty(range = c(-5,1), trans = log10_trans()))

tictoc::tic("grid1:")
tune_results1 <- wfl |>
  tune_grid(resamples = cv_splits,
            grid = grid1,
            metrics = metric_set(accuracy, roc_auc))
tictoc::toc() #57.014 sec elapsed

tictoc::tic("grid2:")
tune_results1 <- wfl |>
  tune_grid(resamples = cv_splits,
            grid = grid2,
            metrics = metric_set(accuracy, roc_auc))
tictoc::toc() #58.22 sec elapsed

tictoc::tic("grid3:")
tune_results1 <- wfl |>
  tune_grid(resamples = cv_splits,
            grid = grid3,
            metrics = metric_set(accuracy, roc_auc))
tictoc::toc() #46.867 sec elapsed

tictoc::tic("grid4:")
tune_results1 <- wfl |>
  tune_grid(resamples = cv_splits,
            grid = grid4,
            metrics = metric_set(accuracy, roc_auc))
tictoc::toc() #46.201 sec elapsed

Max · July 28, 2022, 3:16pm

This is a really good question and I hope my answer doesn't come off as snarky...

It is the cost of using a framework that will let you tune any model (and any preprocessing parameters) with any resampling method, and compute any performance metric versus a function that does one model, without tuning preprocessing, for a very specific resampling method, and limited measures of performance.

Not a knock on glmnet, it's the different between a framework and a function.

We've written a lot on how we use glmnet here. We, like caret before tidymodels, exploit the whole sub-model trick for many models, including glmnet.

tidymodels does use the same set of penalties for all resamples which, I think, is different form what the glmnet cv function does.

We also fix what IMO is a fairly significant bug.

MatthiasHerp · August 14, 2022, 11:56am

Thank you very much for the clarification @Max . I understand why this is the cost of using the tidy models framework. Nevertheless I will build a workaround to do the hyperparameter tuning directly with glmnet in order to reduce my runtime.

system · September 4, 2022, 11:56am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.