I am fitting a ridge regression using the Hitters dataset as done in section 6.5 of ISLR. I noticed the following:
- Without the step_normalize() step in the recipe, I get the exact same estimated model as seen in the book
- With the step_normalize() step, the resulting model is different
I understand that by default, glmnet standardizes variables before the estimation, but if step_normalize() is done first, it shouldn't change the result because glmnet would just standardizes variables with means 0 and a stdevs of 1, so there shouldn't be any significant differences in the outputs, if at all. That said, the results are very much different. I would appreciate any explanation on why this is. Thanks!
library(ISLR2)
library(tidymodels)
tidymodels_prefer()
data <- Hitters %>% filter(!is.na(Salary))
hitter_recipe <- data %>%
recipe(Salary ~ . ) %>%
step_dummy(all_nominal_predictors()) %>%
step_zv(all_predictors()) %>%
step_normalize(all_predictors())
ridge_model <- linear_reg(penalty = 11498, mixture = 0) %>%
set_engine("glmnet")
ridge_wf <- workflow() %>%
add_recipe(hitter_recipe) %>%
add_model(ridge_model)
ridge_fitted_with_step_normalize <- fit(ridge_wf, data)
tidy(ridge_fitted_with_step_normalize) # different output as in the book
################
hitter_recipe_no_normalize <- data %>%
recipe(Salary ~ . ) %>%
step_dummy(all_nominal_predictors()) %>%
step_zv(all_predictors())
ridge_wf_no_normalize <- workflow() %>%
add_recipe(hitter_recipe_no_normalize) %>%
add_model(ridge_model)
ridge_fitted_no_normalize <- fit(ridge_wf_no_normalize , data)
tidy(ridge_fitted_no_normalize ) # gives the same output as in the book