Tuning xgboost regularization hyperparameters in parsnip (and are the hyperparameter ranges sane)

I have two questions regarding a classification model I'm building using xgboost:

  1. What is the syntax for incorporating tuning of the alpha (L1) and lambda (L2) regularization parameters?
  2. Do my hyperparameter ranges make any sense?

For #2, my grid currently looks like this. I am concerned about overfitting, but I'm not too happy with the ROC AUC results so I've been wondering if I could push some of these parameters further (increasing max trees, max depth, etc).

set.seed(45677)
xgb_grid <- grid_space_filling(
  trees(range = c(400, 2000)),
  tree_depth(range = c(3, 15)),
  min_n(range = c(5, 60)),
  loss_reduction(range = c(-5, 1.5)),
  sample_size = sample_prop(c(0.3, 0.8)),
  finalize(mtry(), prep(recipe_1) %>% bake(train_split)),
  learn_rate(range = c(-3, -1.3)),
  stop_iter(range = c(10, 50)),
  size = 20,
  type = 'latin_hypercube'
)

For #1, I see alpha and lambda are in dials here, but I have the extremely simple question of where to actually put them when constructing a tuning grid. Other tuning hyperparameters go within boost_tree with tuning ranges set in grid_space_filling. However, I get an error when I add penalty_L1 and penalty_L2 to boost_tree (unsurprising because they aren't generic parameters?). So are they arguments within set_engine, and then ranges provided within grid_space_filling with the same syntax as the others?

This is my current code for setting up a engine and tuning grid:


set.seed(123)
ini_split <- group_initial_split(input_data,
                                 prop = 0.8,
                                 group = application_number,
                                 strata = acceptance_status)

train_split <- training(ini_split)

test_split <- testing(ini_split)

recipe_1 <- recipe(input_data ~ ., data = train_split) %>%
  step_unknown(all_nominal_predictors()) %>%
  step_dummy(all_nominal_predictors(), one_hot = TRUE)

xgb_tm <- boost_tree(
  trees = tune(),
  tree_depth = tune(),
  min_n = tune(),
  loss_reduction = tune(),
  sample_size = tune(),
  mtry = tune(),
  learn_rate = tune(),
  stop_iter = tune()
) %>%
  set_engine('xgboost',
             eval_metric = 'auc') %>%
  set_mode('classification')

set.seed(45677)
xgb_grid <- grid_space_filling(
  trees(range = c(400, 2000)),
  tree_depth(range = c(3, 15)),
  min_n(range = c(5, 60)),
  loss_reduction(range = c(-5, 1.5)),
  sample_size = sample_prop(c(0.3, 0.8)),
  finalize(mtry(), prep(recipe_1) %>% bake(train_split)),
  learn_rate(range = c(-3, -1.3)),
  stop_iter(range = c(10, 50)),
  size = 20,
  type = 'latin_hypercube'
)

Would incorporating them look like this (I want to set alpha to zero)? I also found some code that suggested in set_engine I should be using alpha and gamma as arguments instead of penalty_L1 and penalty_L2.

xgb_tm <- boost_tree(
  trees = tune(),
  tree_depth = tune(),
  min_n = tune(),
  loss_reduction = tune(),
  sample_size = tune(),
  mtry = tune(),
  learn_rate = tune(),
  stop_iter = tune()
) %>%
  set_engine('xgboost',
             eval_metric = 'auc',
             penalty_L1 = 0,
             penalty_L2 = tune()) %>%
  set_mode('classification')

set.seed(45677)
xgb_grid <- grid_space_filling(
  trees(range = c(400, 2000)),
  tree_depth(range = c(3, 15)),
  min_n(range = c(5, 60)),
  loss_reduction(range = c(-5, 1.5)),
  sample_size = sample_prop(c(0.3, 0.8)),
  finalize(mtry(), prep(recipe_1) %>% bake(train_split)),
  learn_rate(range = c(-3, -1.3)),
  stop_iter(range = c(10, 50)),
  size = 20,
  type = 'latin_hypercube'
)

set.seed(45677)
xgb_grid <- grid_space_filling(
  trees(range = c(400, 2000)),
  tree_depth(range = c(3, 15)),
  min_n(range = c(5, 60)),
  loss_reduction(range = c(-5, 1.5)),
  sample_size = sample_prop(c(0.3, 0.8)),
  finalize(mtry(), prep(recipe_1) %>% bake(train_split)),
  learn_rate(range = c(-3, -1.3)),
  stop_iter(range = c(10, 50)),
  penalty_L2(range = c(-3, 1)),
  size = 20,
  type = 'latin_hypercube'
)
1 Like

From the documentation, the L2 penalty is handled by parsnip and the L1 penalty is named alpha.

For gamma, that is listed in parsnip as loss_reduction (so you don't have to memorize different greek letters that are different across different models).

For mtry, I would specify it in terms of a proportional range. See the details page for that model.

1 Like

Thank you! I think I am still struggling with the syntax though. This produces an error:

xgb_tm <- boost_tree(
  trees = tune(),
  tree_depth = tune(),
  min_n = tune(),
  loss_reduction = tune(),
  sample_size = tune(),
  mtry = tune(),
  learn_rate = tune(),
  stop_iter = tune()
) %>%
  set_engine('xgboost',
             eval_metric = 'auc',
             alpha = 0,
             lambda = tune(),
             counts = FALSE) %>%
  set_mode('classification')

xgb_workflow <- workflow() %>%
  add_model(xgb_tm) %>%
  add_recipe(recipe_1)

set.seed(45677)
xgb_grid <- grid_space_filling(
  trees(range = c(800, 3000)),
  tree_depth(range = c(8, 15)),
  min_n(range = c(5, 60)),
  loss_reduction(range = c(-2, 0)),
  sample_size = sample_prop(c(0.4, 0.8)),
  mtry(range = c(0.1, 1)),
  learn_rate(range = c(-2.5, -1.5)),
  stop_iter(range = c(25, 50)),
  lambda(range = c(0.5, 10)),
  size = 40,
  type = 'latin_hypercube'
)
Error in `mtry()`:
! An integer is required for the range and these do not appear to be whole numbers: 0.1.

Shouldn't counts = FALSE allow mtry() to use a proportion?

Edit: Basically--I have seen the XGBoost documentation and have been digging through parsnip documentation, but I have not found a central location or example of the exact syntax necessary to implement these parameters. I realize now that I should be using alpha and lambda, not penalty_L1() and penalty_L2() from the dials documentation, but I was unable to find guidance that I should be using those terms instead of those functions from dials (and now I am still unsure what they are for). So that's why I am trying to get help with this.

The function mtry() still wants integers; mtry_prop() will take doubles between zero and one.

I find it better to make a parameter object from the workflow/model/recipe and then update the parameters that you want to change. It's also reusable.

We should make a summary function for the parameters objects to make things like this more clear (summary object for parameter sets · Issue #384 · tidymodels/dials · GitHub).

Here's a reprex with some notes on differences. I'm assuming that there are not tuning parameters in your recipe.

library(tidymodels)
tidymodels_prefer()

set.seed(1)
dat <- sim_classification(500)
rs <- vfold_cv(dat)

xgb_tm <- boost_tree(
  trees = tune(),
  tree_depth = tune(),
  min_n = tune(),
  loss_reduction = tune(),
  sample_size = tune(),
  mtry = tune(),
  learn_rate = tune(),
  stop_iter = tune()
) %>%
  set_engine('xgboost',
             eval_metric = 'auc',
             alpha = 0,
             lambda = tune(),
             counts = FALSE) %>%
  set_mode('classification')

xgb_workflow <- workflow() %>%
  add_model(xgb_tm) %>%
  add_formula(class ~ .) # NOTE I made this up since there was no recipe

# NOTE: changes to work with a parameter set
xgb_param <- 
  xgb_workflow %>% 
  extract_parameter_set_dials() %>%
    update(
      trees = trees(c(800, 3000)),
      tree_depth = tree_depth(c(8, 15)),
      min_n = min_n(c(5, 60)),
      loss_reduction = loss_reduction(c(-2, 0)),
      sample_size = sample_prop(c(0.4, 0.8)),
      # NOTE use mtry_prop when `counts = FALSE`
      mtry = mtry_prop(c(0.1, 1)),
      learn_rate = learn_rate(c(-2.5, -1.5)),
      stop_iter = stop_iter(c(25, 50)),
      # NOTE: by default it expects log10 units
      lambda = penalty(log10(c(0.5, 10)))
    )

set.seed(45677)
xgb_grid <- grid_space_filling(xgb_param, size = 10, type = 'latin_hypercube')

set.seed(382)
xgb_res <- xgb_workflow %>% tune_grid(rs, grid = xgb_grid)

show_best(xgb_res, metric = "roc_auc")
#> # A tibble: 5 × 15
#>    mtry trees min_n tree_depth learn_rate loss_reduction sample_size stop_iter
#>   <dbl> <int> <int>      <int>      <dbl>          <dbl>       <dbl>     <int>
#> 1 0.655  1837     9         13    0.0242          0.399        0.776        31
#> 2 0.216  2375    13          8    0.00549         0.213        0.625        33
#> 3 0.330   812    18          9    0.0122          0.850        0.500        26
#> 4 0.482  2337    23         11    0.00396         0.116        0.739        46
#> 5 0.184  1386    30         13    0.00657         0.0249       0.541        48
#> # ℹ 7 more variables: lambda <dbl>, .metric <chr>, .estimator <chr>,
#> #   mean <dbl>, n <int>, std_err <dbl>, .config <chr>

Created on 2025-04-15 with reprex v2.1.1

This is incredibly helpful--thank you so much! Could you explain more about the advantages of creating a parameter set versus defining in the grid, given that the arguments in the parameter set might be specific to xgboost (i.e. couldn't use in a different type of model)? I'm definitely not arguing against it, just trying to understand more.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.