Tuning xgboost regularization hyperparameters in parsnip (and are the hyperparameter ranges sane)

I have two questions regarding a classification model I'm building using xgboost:

  1. What is the syntax for incorporating tuning of the alpha (L1) and lambda (L2) regularization parameters?
  2. Do my hyperparameter ranges make any sense?

For #2, my grid currently looks like this. I am concerned about overfitting, but I'm not too happy with the ROC AUC results so I've been wondering if I could push some of these parameters further (increasing max trees, max depth, etc).

set.seed(45677)
xgb_grid <- grid_space_filling(
  trees(range = c(400, 2000)),
  tree_depth(range = c(3, 15)),
  min_n(range = c(5, 60)),
  loss_reduction(range = c(-5, 1.5)),
  sample_size = sample_prop(c(0.3, 0.8)),
  finalize(mtry(), prep(recipe_1) %>% bake(train_split)),
  learn_rate(range = c(-3, -1.3)),
  stop_iter(range = c(10, 50)),
  size = 20,
  type = 'latin_hypercube'
)

For #1, I see alpha and lambda are in dials here, but I have the extremely simple question of where to actually put them when constructing a tuning grid. Other tuning hyperparameters go within boost_tree with tuning ranges set in grid_space_filling. However, I get an error when I add penalty_L1 and penalty_L2 to boost_tree (unsurprising because they aren't generic parameters?). So are they arguments within set_engine, and then ranges provided within grid_space_filling with the same syntax as the others?

This is my current code for setting up a engine and tuning grid:


set.seed(123)
ini_split <- group_initial_split(input_data,
                                 prop = 0.8,
                                 group = application_number,
                                 strata = acceptance_status)

train_split <- training(ini_split)

test_split <- testing(ini_split)

recipe_1 <- recipe(input_data ~ ., data = train_split) %>%
  step_unknown(all_nominal_predictors()) %>%
  step_dummy(all_nominal_predictors(), one_hot = TRUE)

xgb_tm <- boost_tree(
  trees = tune(),
  tree_depth = tune(),
  min_n = tune(),
  loss_reduction = tune(),
  sample_size = tune(),
  mtry = tune(),
  learn_rate = tune(),
  stop_iter = tune()
) %>%
  set_engine('xgboost',
             eval_metric = 'auc') %>%
  set_mode('classification')

set.seed(45677)
xgb_grid <- grid_space_filling(
  trees(range = c(400, 2000)),
  tree_depth(range = c(3, 15)),
  min_n(range = c(5, 60)),
  loss_reduction(range = c(-5, 1.5)),
  sample_size = sample_prop(c(0.3, 0.8)),
  finalize(mtry(), prep(recipe_1) %>% bake(train_split)),
  learn_rate(range = c(-3, -1.3)),
  stop_iter(range = c(10, 50)),
  size = 20,
  type = 'latin_hypercube'
)

Would incorporating them look like this (I want to set alpha to zero)? I also found some code that suggested in set_engine I should be using alpha and gamma as arguments instead of penalty_L1 and penalty_L2.

xgb_tm <- boost_tree(
  trees = tune(),
  tree_depth = tune(),
  min_n = tune(),
  loss_reduction = tune(),
  sample_size = tune(),
  mtry = tune(),
  learn_rate = tune(),
  stop_iter = tune()
) %>%
  set_engine('xgboost',
             eval_metric = 'auc',
             penalty_L1 = 0,
             penalty_L2 = tune()) %>%
  set_mode('classification')

set.seed(45677)
xgb_grid <- grid_space_filling(
  trees(range = c(400, 2000)),
  tree_depth(range = c(3, 15)),
  min_n(range = c(5, 60)),
  loss_reduction(range = c(-5, 1.5)),
  sample_size = sample_prop(c(0.3, 0.8)),
  finalize(mtry(), prep(recipe_1) %>% bake(train_split)),
  learn_rate(range = c(-3, -1.3)),
  stop_iter(range = c(10, 50)),
  size = 20,
  type = 'latin_hypercube'
)

set.seed(45677)
xgb_grid <- grid_space_filling(
  trees(range = c(400, 2000)),
  tree_depth(range = c(3, 15)),
  min_n(range = c(5, 60)),
  loss_reduction(range = c(-5, 1.5)),
  sample_size = sample_prop(c(0.3, 0.8)),
  finalize(mtry(), prep(recipe_1) %>% bake(train_split)),
  learn_rate(range = c(-3, -1.3)),
  stop_iter(range = c(10, 50)),
  penalty_L2(range = c(-3, 1)),
  size = 20,
  type = 'latin_hypercube'
)
1 Like

From the documentation, the L2 penalty is handled by parsnip and the L1 penalty is named alpha.

For gamma, that is listed in parsnip as loss_reduction (so you don't have to memorize different greek letters that are different across different models).

For mtry, I would specify it in terms of a proportional range. See the details page for that model.

1 Like

Thank you! I think I am still struggling with the syntax though. This produces an error:

xgb_tm <- boost_tree(
  trees = tune(),
  tree_depth = tune(),
  min_n = tune(),
  loss_reduction = tune(),
  sample_size = tune(),
  mtry = tune(),
  learn_rate = tune(),
  stop_iter = tune()
) %>%
  set_engine('xgboost',
             eval_metric = 'auc',
             alpha = 0,
             lambda = tune(),
             counts = FALSE) %>%
  set_mode('classification')

xgb_workflow <- workflow() %>%
  add_model(xgb_tm) %>%
  add_recipe(recipe_1)

set.seed(45677)
xgb_grid <- grid_space_filling(
  trees(range = c(800, 3000)),
  tree_depth(range = c(8, 15)),
  min_n(range = c(5, 60)),
  loss_reduction(range = c(-2, 0)),
  sample_size = sample_prop(c(0.4, 0.8)),
  mtry(range = c(0.1, 1)),
  learn_rate(range = c(-2.5, -1.5)),
  stop_iter(range = c(25, 50)),
  lambda(range = c(0.5, 10)),
  size = 40,
  type = 'latin_hypercube'
)
Error in `mtry()`:
! An integer is required for the range and these do not appear to be whole numbers: 0.1.

Shouldn't counts = FALSE allow mtry() to use a proportion?

Edit: Basically--I have seen the XGBoost documentation and have been digging through parsnip documentation, but I have not found a central location or example of the exact syntax necessary to implement these parameters. I realize now that I should be using alpha and lambda, not penalty_L1() and penalty_L2() from the dials documentation, but I was unable to find guidance that I should be using those terms instead of those functions from dials (and now I am still unsure what they are for). So that's why I am trying to get help with this.

The function mtry() still wants integers; mtry_prop() will take doubles between zero and one.

I find it better to make a parameter object from the workflow/model/recipe and then update the parameters that you want to change. It's also reusable.

We should make a summary function for the parameters objects to make things like this more clear (summary object for parameter sets · Issue #384 · tidymodels/dials · GitHub).

Here's a reprex with some notes on differences. I'm assuming that there are not tuning parameters in your recipe.

library(tidymodels)
tidymodels_prefer()

set.seed(1)
dat <- sim_classification(500)
rs <- vfold_cv(dat)

xgb_tm <- boost_tree(
  trees = tune(),
  tree_depth = tune(),
  min_n = tune(),
  loss_reduction = tune(),
  sample_size = tune(),
  mtry = tune(),
  learn_rate = tune(),
  stop_iter = tune()
) %>%
  set_engine('xgboost',
             eval_metric = 'auc',
             alpha = 0,
             lambda = tune(),
             counts = FALSE) %>%
  set_mode('classification')

xgb_workflow <- workflow() %>%
  add_model(xgb_tm) %>%
  add_formula(class ~ .) # NOTE I made this up since there was no recipe

# NOTE: changes to work with a parameter set
xgb_param <- 
  xgb_workflow %>% 
  extract_parameter_set_dials() %>%
    update(
      trees = trees(c(800, 3000)),
      tree_depth = tree_depth(c(8, 15)),
      min_n = min_n(c(5, 60)),
      loss_reduction = loss_reduction(c(-2, 0)),
      sample_size = sample_prop(c(0.4, 0.8)),
      # NOTE use mtry_prop when `counts = FALSE`
      mtry = mtry_prop(c(0.1, 1)),
      learn_rate = learn_rate(c(-2.5, -1.5)),
      stop_iter = stop_iter(c(25, 50)),
      # NOTE: by default it expects log10 units
      lambda = penalty(log10(c(0.5, 10)))
    )

set.seed(45677)
xgb_grid <- grid_space_filling(xgb_param, size = 10, type = 'latin_hypercube')

set.seed(382)
xgb_res <- xgb_workflow %>% tune_grid(rs, grid = xgb_grid)

show_best(xgb_res, metric = "roc_auc")
#> # A tibble: 5 × 15
#>    mtry trees min_n tree_depth learn_rate loss_reduction sample_size stop_iter
#>   <dbl> <int> <int>      <int>      <dbl>          <dbl>       <dbl>     <int>
#> 1 0.655  1837     9         13    0.0242          0.399        0.776        31
#> 2 0.216  2375    13          8    0.00549         0.213        0.625        33
#> 3 0.330   812    18          9    0.0122          0.850        0.500        26
#> 4 0.482  2337    23         11    0.00396         0.116        0.739        46
#> 5 0.184  1386    30         13    0.00657         0.0249       0.541        48
#> # ℹ 7 more variables: lambda <dbl>, .metric <chr>, .estimator <chr>,
#> #   mean <dbl>, n <int>, std_err <dbl>, .config <chr>

Created on 2025-04-15 with reprex v2.1.1

This is incredibly helpful--thank you so much! Could you explain more about the advantages of creating a parameter set versus defining in the grid, given that the arguments in the parameter set might be specific to xgboost (i.e. couldn't use in a different type of model)? I'm definitely not arguing against it, just trying to understand more.