Tuning xgboost regularization hyperparameters in parsnip (and are the hyperparameter ranges sane)

I have two questions regarding a classification model I'm building using xgboost:

  1. What is the syntax for incorporating tuning of the alpha (L1) and lambda (L2) regularization parameters?
  2. Do my hyperparameter ranges make any sense?

For #2, my grid currently looks like this. I am concerned about overfitting, but I'm not too happy with the ROC AUC results so I've been wondering if I could push some of these parameters further (increasing max trees, max depth, etc).

set.seed(45677)
xgb_grid <- grid_space_filling(
  trees(range = c(400, 2000)),
  tree_depth(range = c(3, 15)),
  min_n(range = c(5, 60)),
  loss_reduction(range = c(-5, 1.5)),
  sample_size = sample_prop(c(0.3, 0.8)),
  finalize(mtry(), prep(recipe_1) %>% bake(train_split)),
  learn_rate(range = c(-3, -1.3)),
  stop_iter(range = c(10, 50)),
  size = 20,
  type = 'latin_hypercube'
)

For #1, I see alpha and lambda are in dials here, but I have the extremely simple question of where to actually put them when constructing a tuning grid. Other tuning hyperparameters go within boost_tree with tuning ranges set in grid_space_filling. However, I get an error when I add penalty_L1 and penalty_L2 to boost_tree (unsurprising because they aren't generic parameters?). So are they arguments within set_engine, and then ranges provided within grid_space_filling with the same syntax as the others?

This is my current code for setting up a engine and tuning grid:


set.seed(123)
ini_split <- group_initial_split(input_data,
                                 prop = 0.8,
                                 group = application_number,
                                 strata = acceptance_status)

train_split <- training(ini_split)

test_split <- testing(ini_split)

recipe_1 <- recipe(input_data ~ ., data = train_split) %>%
  step_unknown(all_nominal_predictors()) %>%
  step_dummy(all_nominal_predictors(), one_hot = TRUE)

xgb_tm <- boost_tree(
  trees = tune(),
  tree_depth = tune(),
  min_n = tune(),
  loss_reduction = tune(),
  sample_size = tune(),
  mtry = tune(),
  learn_rate = tune(),
  stop_iter = tune()
) %>%
  set_engine('xgboost',
             eval_metric = 'auc') %>%
  set_mode('classification')

set.seed(45677)
xgb_grid <- grid_space_filling(
  trees(range = c(400, 2000)),
  tree_depth(range = c(3, 15)),
  min_n(range = c(5, 60)),
  loss_reduction(range = c(-5, 1.5)),
  sample_size = sample_prop(c(0.3, 0.8)),
  finalize(mtry(), prep(recipe_1) %>% bake(train_split)),
  learn_rate(range = c(-3, -1.3)),
  stop_iter(range = c(10, 50)),
  size = 20,
  type = 'latin_hypercube'
)

Would incorporating them look like this (I want to set alpha to zero)? I also found some code that suggested in set_engine I should be using alpha and gamma as arguments instead of penalty_L1 and penalty_L2.

xgb_tm <- boost_tree(
  trees = tune(),
  tree_depth = tune(),
  min_n = tune(),
  loss_reduction = tune(),
  sample_size = tune(),
  mtry = tune(),
  learn_rate = tune(),
  stop_iter = tune()
) %>%
  set_engine('xgboost',
             eval_metric = 'auc',
             penalty_L1 = 0,
             penalty_L2 = tune()) %>%
  set_mode('classification')

set.seed(45677)
xgb_grid <- grid_space_filling(
  trees(range = c(400, 2000)),
  tree_depth(range = c(3, 15)),
  min_n(range = c(5, 60)),
  loss_reduction(range = c(-5, 1.5)),
  sample_size = sample_prop(c(0.3, 0.8)),
  finalize(mtry(), prep(recipe_1) %>% bake(train_split)),
  learn_rate(range = c(-3, -1.3)),
  stop_iter(range = c(10, 50)),
  size = 20,
  type = 'latin_hypercube'
)

set.seed(45677)
xgb_grid <- grid_space_filling(
  trees(range = c(400, 2000)),
  tree_depth(range = c(3, 15)),
  min_n(range = c(5, 60)),
  loss_reduction(range = c(-5, 1.5)),
  sample_size = sample_prop(c(0.3, 0.8)),
  finalize(mtry(), prep(recipe_1) %>% bake(train_split)),
  learn_rate(range = c(-3, -1.3)),
  stop_iter(range = c(10, 50)),
  penalty_L2(range = c(-3, 1)),
  size = 20,
  type = 'latin_hypercube'
)
1 Like

From the documentation, the L2 penalty is handled by parsnip and the L1 penalty is named alpha.

For gamma, that is listed in parsnip as loss_reduction (so you don't have to memorize different greek letters that are different across different models).

For mtry, I would specify it in terms of a proportional range. See the details page for that model.

1 Like