Hi,
I really like tidymodels But recently, my models have increased in size as a function of training data, which i would like to avoid. to avoid this my current code have fitted the model in a separate environment.
in glm you can use y = FALSE
and model = FALSE
to ensure data is not saved with the model.
glm(y ~ x, data = d, family = binomial(), model = FALSE, y = FALSE)
However, if i understand it correctly, these parameters cannot be passed to glm when using tidymodels, right?
To reduce the model size, I have also tried removing parts of the object before saving (but it seems to pull them back from the environment before saving them or break the predictive ability of the models).
so, how can i ensure that models created using tidymodels does not save any training data and stay the same size irrespective of training data size?
I use this code, and it did work some time ago but now saves inflated models.
I have tried adding parsnip::set_engine("lm"**, y = FALSE, model = FALSE**)
and parsnip::fit(wf_final, data = xy_all, y = FALSE, model = FALSE**)
with no luck.
model_save_small_size <- function(xy_all, final_recipe, penalty, mixture, model, nr_predictors) {
env_final_model <- new.env(parent = globalenv())
env_final_model$xy_all <- xy_all
env_final_model$final_recipe <- final_recipe
env_final_model$penalty_mode <- statisticalMode(penalty)
env_final_model$mixture_mode <- statisticalMode(mixture)
env_final_model$model <- model
env_final_model$nr_predictors <- nr_predictors
env_final_model$statisticalMode <- statisticalMode
env_final_model$`%>%` <- `%>%`
final_predictive_model <- with(env_final_model, {
if (nr_predictors > 3) {
final_predictive_model_spec <-
if (model == "regression") {
parsnip::linear_reg(penalty = penalty_mode, mixture = mixture_mode)
} else if (model == "logistic") {
parsnip::logistic_reg(mode = "classification", penalty = penalty_mode, mixture = mixture_mode)
} else if (model == "multinomial") {
parsnip::multinom_reg(mode = "classification", penalty = penalty_mode, mixture = mixture_mode)
}
final_predictive_model_spec <- final_predictive_model_spec %>%
parsnip::set_engine("glmnet")
# Create Workflow (to know variable roles from recipes) help(workflow)
wf_final <- workflows::workflow() %>%
workflows::add_model(final_predictive_model_spec) %>%
workflows::add_recipe(final_recipe[[1]])
parsnip::fit(wf_final, data = xy_all)
} else if (nr_predictors == 3) {
final_predictive_model_spec <-
if (model == "regression") {
parsnip::linear_reg(mode = "regression") %>%
parsnip::set_engine("lm")
} else if (model == "logistic") {
parsnip::logistic_reg(mode = "classification") %>%
parsnip::set_engine("glm")
} else if (model == "multinomial") {
parsnip::multinom_reg(mode = "classification") %>%
parsnip::set_engine("glmnet")
}
wf_final <- workflows::workflow() %>%
workflows::add_model(final_predictive_model_spec) %>%
workflows::add_recipe(final_recipe[[1]])
### parsnip::fit(wf_final, data = xy_all)
parsnip::fit(wf_final, data = xy_all)
}
})
remove("final_recipe", envir = env_final_model)
remove("xy_all", envir = env_final_model)
return(final_predictive_model)
}
Any help is much appreciated.