paralellization of several models workflow contained in a tibble (tidymodels)

I am working in the calibration and specification of several models and seems. To have all the models processes tracked, i have stack all of them in a single tibble like this:

# Main label of the model and data for each situation -----

Tibble_Data_models <- tibble(
  Species = unique(iris$Species) ,
  data    = iris %>%  group_split(Species)
)

Tibble_Data_models

For each data, i have created its respective recipe and corresponded model.

# Recipes ----

Recipe_creation <- function(data) {
  recipe(formula = as.formula(' Sepal.Length ~ Sepal.Width'), data = data) %>%
    step_corr(all_numeric(), threshold = 0.99) %>%  
}

Tibble_Data_models <- Tibble_Data_models %>% 
  mutate('Recipe' = map(.x = data, .f = Recipe_creation))

# model creation -----

Model_RF <- rand_forest(
  mode = 'classification',
  engine = 'ranger',
  mtry = 50,
  trees = tune(),
  min_n = tune()
)

Tibble_Data_models <- Tibble_Data_models %>% 
  mutate('Model_SPEC_full' = rep( list(Model_RF), dim(.)[1] ))

Also, its resamples and the grid with the correspondent hiperparameters grid for each model config:

Tibble_Data_models <- Tibble_Data_models %>%
  mutate('Resamples'= map(data, ~ vfold_cv(..1, v = 10,strata = 'Species')))

Tibble_Data_models <- Tibble_Data_models %>%
  mutate('Grid_HP'= map(Model_SPEC_full , ~ grid_max_entropy( x = extract_parameter_set_dials(..1), size = 100 ) ))

Tibble_Data_models 

To adjust the models, i created a function with safely that in case the the model calibration fails i do not get stucked.

F_safely_tune_grid <- purrr::safely(
 function(recipe, Model_SPEC_full, grid, resamples, seed = 1234567890, parallel_over = 'everything' ) {
   set.seed(seed)
   workflow(preprocessor = recipe, spec = Model_SPEC_full) %>% 
     tune_grid(
       grid = grid,
       resamples = resamples,
       control = control_grid(save_pred = FALSE, parallel_over = parallel_over))
})

So heres is the catch of the situation. When i try to parallelize the situation i do like this:

plan( strategy = multisession, workers = 10)

Tibble_Data_models <- Tibble_Data_models %>%
  mutate('HP_search' = furrr::future_pmap(
    list(receta_modelo , Model_SPEC_full, Grid_HP, Resamples),
    F_safely_tune_grid
    #,seed = 1234567890,
    #parallel_over = 'everything'
  ))

But, where do these workers go? to the future_map or the F_safely_tune_grid ? The workers should go inside allow a wide HP search, instead of the list in future_map. Am i parallelizing well this whole workflow?

Also, Is there any resource to check to check if its beeing any parallelization?, aside to microbenchmark The functions?

I consulted several posts, specially this post of SimonPCouch, but the library library(doMC) seems to not exist anymor in CRAN.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.