Why do I get different estimated ridge models with and without step_normalize() in recipe

Normal_distribution · November 23, 2025, 12:54pm

I am fitting a ridge regression using the Hitters dataset as done in section 6.5 of ISLR. I noticed the following:

Without the step_normalize() step in the recipe, I get the exact same estimated model as seen in the book
With the step_normalize() step, the resulting model is different

I understand that by default, glmnet standardizes variables before the estimation, but if step_normalize() is done first, it shouldn't change the result because glmnet would just standardizes variables with means 0 and a stdevs of 1, so there shouldn't be any significant differences in the outputs, if at all. That said, the results are very much different. I would appreciate any explanation on why this is. Thanks!

library(ISLR2)
library(tidymodels)
tidymodels_prefer()

data <- Hitters %>% filter(!is.na(Salary))

hitter_recipe <- data %>% 
  recipe(Salary ~ . ) %>% 
  step_dummy(all_nominal_predictors()) %>% 
  step_zv(all_predictors()) %>% 
  step_normalize(all_predictors()) 

ridge_model <- linear_reg(penalty = 11498, mixture = 0) %>%
  set_engine("glmnet")

ridge_wf <- workflow() %>% 
  add_recipe(hitter_recipe) %>% 
  add_model(ridge_model)

ridge_fitted_with_step_normalize <- fit(ridge_wf, data)

tidy(ridge_fitted_with_step_normalize) # different output as in the book

################
hitter_recipe_no_normalize <- data %>% 
  recipe(Salary ~ . ) %>% 
  step_dummy(all_nominal_predictors()) %>% 
  step_zv(all_predictors())

ridge_wf_no_normalize <- workflow() %>% 
  add_recipe(hitter_recipe_no_normalize) %>% 
  add_model(ridge_model)

ridge_fitted_no_normalize <- fit(ridge_wf_no_normalize , data)

tidy(ridge_fitted_no_normalize ) # gives the same output as in the book

olibravo · November 28, 2025, 8:01am

Please notice that in case of linear models the estimated coefficient for a variable X can change depending on the range/scale of the variable. The coeficient will compensate the value of the variable to capture the true impact of it. For example, if X takes huge values the coeeficient may tend to be small and vice-versa. After normalizaton the coefficient don't have to do the compensation job to properly catch the real imapct of X. For x=1000 and coefficient=0.01, you get the same efect as x=0.1 and coeeficient=100. I hope it helps. Please check the coeeficients for your models