mtry tune range is integers when counts = FALSE with xgboost engine

The boosted trees via xgboost webpage (Boosted trees via xgboost — details_boost_tree_xgboost • parsnip) states the user can pass the counts = FALSE argument to set_engine() to supply mtry values within [0,1]. If mtry is set to a value in [0, 1], I can use tune_sim_anneal() to tune the other parameters. When mtry = tune(), the mtry range is set to integers with an unknown upper limit. With counts = FALSE I was expecting the range for mtry to be a double and on [0, 1]. Is there a way to set the mtry range as a proportion for tuning using tune_sim_anneal()?


xgb_reg <-
    mtry = tune(),
    trees = tune(),
    min_n = tune(),
    tree_depth = tune(),
    learn_rate = tune(),
    loss_reduction = tune(),
    sample_size = tune()) %>%
  set_engine("xgboost", counts = FALSE) %>%

# inspecting the mtry object below shows that it has type of integer
# and a lower limit of 1L and unknown upper limit
extract_parameter_set_dials(xgb_reg) %>%
  filter(name == "mtry") %>%
  pull(object) %>%

Hi @mbanghart!

If you'd like to tune over mtry with simulated annealing, you can:

  • set counts = TRUE and then define a custom parameter set to param_info, or
  • leave the counts argument as its default and initially tune over a grid to initialize those upper limits before using simulated annealing

Here's some example code demonstrating tuning on mtry with simulated annealing.


data(penguins, package = "modeldata")

# as a proportion:
bt_tune_prop <-
  boost_tree(mtry = tune()) %>%
  set_engine(engine = "xgboost", counts = FALSE) %>%
  set_mode(mode = "classification")

grid_anneal_prop <-
    species ~ flipper_length_mm + island,
    param_info = 
      extract_parameter_set_dials(bt_tune_prop) %>% 
      update(mtry = mtry_prop())
#> ❯  Generating a set of 1 initial parameter results
#> ✓ Initialization complete
#> Optimizing roc_auc
#> Initial best: 0.95642
#>  1 ◯ accept suboptimal  roc_auc=0.95613  (+/-0.00232)
#>  2 ♥ new best           roc_auc=0.9568   (+/-0.002101)
#>  3 ◯ accept suboptimal  roc_auc=0.95566  (+/-0.002179)
#>  4 ♥ new best           roc_auc=0.96007  (+/-0.002062)
#>  5 ♥ new best           roc_auc=0.96007  (+/-0.002045)
#>  6 ♥ new best           roc_auc=0.96185  (+/-0.002087)
#>  7 ◯ accept suboptimal  roc_auc=0.96165  (+/-0.002138)
#>  8 ◯ accept suboptimal  roc_auc=0.96149  (+/-0.002164)
#>  9 ♥ new best           roc_auc=0.96191  (+/-0.001738)
#> 10 ◯ accept suboptimal  roc_auc=0.96098  (+/-0.001949)


# as a count:
bt_tune_count <-
  boost_tree(mtry = tune()) %>%
  set_engine(engine = "xgboost") %>%
  set_mode(mode = "classification")

grid <-
    species ~ flipper_length_mm + island,
#> i Creating pre-processing data to finalize unknown parameter: mtry

grid_anneal <-
    species ~ flipper_length_mm + island,
    initial = grid
#> Optimizing roc_auc
#> Initial best: 0.96087
#>  1 ◯ accept suboptimal  roc_auc=0.95825  (+/-0.001964)
#>  2 ◯ accept suboptimal  roc_auc=0.95396  (+/-0.001823)
#>  3 ◯ accept suboptimal  roc_auc=0.94992  (+/-0.001875)
#>  4 + better suboptimal  roc_auc=0.95342  (+/-0.001994)
#>  5 ◯ accept suboptimal  roc_auc=0.94992  (+/-0.001875)
#>  6 + better suboptimal  roc_auc=0.95342  (+/-0.002041)
#>  7 ◯ accept suboptimal  roc_auc=0.94992  (+/-0.001875)
#>  8 ✖ restart from best  roc_auc=0.95408  (+/-0.002)
#>  9 ◯ accept suboptimal  roc_auc=0.95773  (+/-0.001989)
#> 10 ◯ accept suboptimal  roc_auc=0.95432  (+/-0.002001)


If this doesn't do the trick for you, could you modify this code to demonstrate the functionality you're hoping to see?


Thanks. Using update with the mtry_prop() was just what I needed.

