I have two questions regarding a classification model I'm building using xgboost:
- What is the syntax for incorporating tuning of the alpha (L1) and lambda (L2) regularization parameters?
- Do my hyperparameter ranges make any sense?
For #2, my grid currently looks like this. I am concerned about overfitting, but I'm not too happy with the ROC AUC results so I've been wondering if I could push some of these parameters further (increasing max trees, max depth, etc).
set.seed(45677)
xgb_grid <- grid_space_filling(
trees(range = c(400, 2000)),
tree_depth(range = c(3, 15)),
min_n(range = c(5, 60)),
loss_reduction(range = c(-5, 1.5)),
sample_size = sample_prop(c(0.3, 0.8)),
finalize(mtry(), prep(recipe_1) %>% bake(train_split)),
learn_rate(range = c(-3, -1.3)),
stop_iter(range = c(10, 50)),
size = 20,
type = 'latin_hypercube'
)
For #1, I see alpha and lambda are in dials here, but I have the extremely simple question of where to actually put them when constructing a tuning grid. Other tuning hyperparameters go within boost_tree
with tuning ranges set in grid_space_filling
. However, I get an error when I add penalty_L1
and penalty_L2
to boost_tree
(unsurprising because they aren't generic parameters?). So are they arguments within set_engine
, and then ranges provided within grid_space_filling
with the same syntax as the others?
This is my current code for setting up a engine and tuning grid:
set.seed(123)
ini_split <- group_initial_split(input_data,
prop = 0.8,
group = application_number,
strata = acceptance_status)
train_split <- training(ini_split)
test_split <- testing(ini_split)
recipe_1 <- recipe(input_data ~ ., data = train_split) %>%
step_unknown(all_nominal_predictors()) %>%
step_dummy(all_nominal_predictors(), one_hot = TRUE)
xgb_tm <- boost_tree(
trees = tune(),
tree_depth = tune(),
min_n = tune(),
loss_reduction = tune(),
sample_size = tune(),
mtry = tune(),
learn_rate = tune(),
stop_iter = tune()
) %>%
set_engine('xgboost',
eval_metric = 'auc') %>%
set_mode('classification')
set.seed(45677)
xgb_grid <- grid_space_filling(
trees(range = c(400, 2000)),
tree_depth(range = c(3, 15)),
min_n(range = c(5, 60)),
loss_reduction(range = c(-5, 1.5)),
sample_size = sample_prop(c(0.3, 0.8)),
finalize(mtry(), prep(recipe_1) %>% bake(train_split)),
learn_rate(range = c(-3, -1.3)),
stop_iter(range = c(10, 50)),
size = 20,
type = 'latin_hypercube'
)
Would incorporating them look like this (I want to set alpha to zero)? I also found some code that suggested in set_engine I should be using alpha and gamma as arguments instead of penalty_L1 and penalty_L2.
xgb_tm <- boost_tree(
trees = tune(),
tree_depth = tune(),
min_n = tune(),
loss_reduction = tune(),
sample_size = tune(),
mtry = tune(),
learn_rate = tune(),
stop_iter = tune()
) %>%
set_engine('xgboost',
eval_metric = 'auc',
penalty_L1 = 0,
penalty_L2 = tune()) %>%
set_mode('classification')
set.seed(45677)
xgb_grid <- grid_space_filling(
trees(range = c(400, 2000)),
tree_depth(range = c(3, 15)),
min_n(range = c(5, 60)),
loss_reduction(range = c(-5, 1.5)),
sample_size = sample_prop(c(0.3, 0.8)),
finalize(mtry(), prep(recipe_1) %>% bake(train_split)),
learn_rate(range = c(-3, -1.3)),
stop_iter(range = c(10, 50)),
size = 20,
type = 'latin_hypercube'
)
set.seed(45677)
xgb_grid <- grid_space_filling(
trees(range = c(400, 2000)),
tree_depth(range = c(3, 15)),
min_n(range = c(5, 60)),
loss_reduction(range = c(-5, 1.5)),
sample_size = sample_prop(c(0.3, 0.8)),
finalize(mtry(), prep(recipe_1) %>% bake(train_split)),
learn_rate(range = c(-3, -1.3)),
stop_iter(range = c(10, 50)),
penalty_L2(range = c(-3, 1)),
size = 20,
type = 'latin_hypercube'
)