Background: I am trying to do a Bayesian hyperparameter search for boosted tree based models (xgboost / LightGBM) using tune::tune_bayes()
. There are a number of hyperparameters I want to tune, but one is sort of special: trees
. This is already recognized to some extent by the tune
package, for example if I create a grid of hyperparameters and for each set ask for all values of trees
from 1 to 1000, then tune_grid
understands that it should just create a model with 1000 trees and should additionally evaluate (and save the evaluation) after each additional tree has been created (i.e. by trying trees=1000
you can evaluate 1:999
almost for free at the same time). That's of course rather specific to boosting, because of the way the trees are sequentially created by looking at what the previous ones still get wrong.
What is my problem/am I right that this is a problem? However, I cannot figure out (maybe I overlooked something) how to get tune_bayes()
to behave in such a way. When I specify that trees
should be tuned - e.g. via boost_tree(mtry = tune(), trees = tune())
- then it will tune_bayes()
seems to try one specific value trees
such as 277 and only retains the metric of interest for trees=277
(but not for 1:276). If I specify a fixed number of trees with boost_tree(mtry = tune(), trees = 1000)
, then I just get the results for trees=1000
(but not for 1:999). Or did I misunderstand what tune_bayes()
does and it actually does what I want (in which case a clearer description in the documentation would probably be good)?
Main question: If my understanding above is correct, then it is non-ideal that tune_bayes()
does not explore trees
efficiently, both in terms of finding a good value for trees
, but also for all the other parameters (you may have picked a great combination, but just cannot see it because of a poor choice for trees
). I'd be quite happy to specify an upper bound myself for trees, I just want to also automatically evaluate all values below that. Any pointers on how to get tune_bayes()
to do that?
One - somewhat inferior - approach for xgboost: One could use trees=1100, stop_iter=100
to do early stopping, when I believe the validations score at the best iteration would be returned (so effectively getting me the validation metric for the trees
value at which the validation metric was lowest, right?). I just need to be a bit careful about my choice of stop_iter
, it especially must not be too small so that we do not wrongly stop early when the model was just fluctuating a bit would have improved later. Additionally, there is the akward scenario where the performance at iteration 1100 is a lot worse than at 1050, but early stopping has not occured, so we get the performance for trees=1100
back, I think. I.e. one probably has to make trees
quite large, which could then occassionally waste some time. I believe this is not an option for LightGBM at the moment, because treesnip
does not support stop_iter
, yet.