I'm sure this answer is in documentation somewhere, but I'm not having any luck finding it. For implementing lasso in tidymodels, you need to step_normalize() numeric variables first, even though the glmnet() function uses standardize=TRUE? Thanks!
Hi @lisalendway,
I would use the template recipe from usemodels::use_glmnet:
See examples: https://usemodels.tidymodels.org/reference/templates.html#examples
library(palmerpenguins)
data(penguins)
use_glmnet(species ~ ., data = penguins)
#> glmnet_recipe <-
#> recipe(formula = species ~ ., data = penguins) %>%
#> step_novel(all_nominal(), -all_outcomes()) %>%
#> step_dummy(all_nominal(), -all_outcomes()) %>%
#> step_zv(all_predictors()) %>%
#> step_normalize(all_predictors(), -all_nominal())
The advantage of doing it inside the recipe (which I would guess there is no harm if glmnet takes that step- as the variables will already be normalized so it should not matter?) is that the same step will also be applied when you "bake" the test set (using the mean/sd from the training set), so your predictors will be on the same scale in both the training and testing sets and may be easier to interpret/visualize/etc (see skip = FALSE
as the default here: https://recipes.tidymodels.org/reference/step_normalize.html). Added bonus is you won't have to worry about data leakage when using the recipe
Right! I wasn't thinking about the applying it to new data step. Now that makes total sense. And thanks for the reference - that's exactly what I was looking for.
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.