Hi, all. I'm confused as to how to tune the scale_pos_weight hyperparameter in xgboost models in tidymodels. I've read the documentation but clearly am not implementing correctly. Would love help with this one!
Putting it in the set_engine("xgboost", scale_pos_weight = tune())
I know that I can pass a given scale_pos_weight value to xgboost via the set_engine statement, but I'm stumped as to how to tune it though from the closed issues on GitHub, it is clearly possible.
I still don't quite understand this. If I put the scale_pos_weight = tune() statement in the set_engine part of the model specification, how do I actually have it tune over some space when using the default (unspecified) grid?
The above fails (Warning message: This tuning result has notes. Example notes on model fitting include: internal: Error: Can't subset columns that don't exist. ✖ Column `scale_pos_weight` doesn't exist. ).
Is there a default range over which tidymodels will search the scale_pos_weight space? I do know that the general recommendation is neg/pos.
How do I specify the range for one hyperparameter if I want to make use of the defaults for other hyperparameters?
Where can I find what space is being considered for each hyperparameter?
Thanks for your patience with my basic questions here! I have spent loads of time at the wonderful pkgdown site but I don't always end up finding all of the relevant bits (that's obviously on me! it's beautifully organized -- apologies for asking questions that are answered there. I'm always happy to read tutorials and documentation that you link to).
Information on parameters: E.g., for parameters that have an unknown() bit, how can you see what value was filled in (in one of the resulting objects created), or where in the documentation does it list the equation for calculating, eg, the max value for mtry? (I assume the max value for mtry is the number of columns, but I see that there are other parameters that I haven't touched that also have unknown() listed, like sample_size, and would love to know where to see how default values are chosen, if unspecified by the user.)
My tidymodels seems to be up to date -- I am sure this is a user error on where I'm specifying tune = scale_pos_weight() or my lack of specification of a range.
Scale pos weight custom range: The default scale_pos_weight range doesn't work for me, but I haven't yet made sense of what needs to happen to provide custom range (eg finalize? most of the examples I've seen have hyperparameters that are either tuned over default ranges or set, not tuned over custom ranges).
library(mlbench)
library(forcats)
library(tidymodels)
#> Warning: package 'tidymodels' was built under R version 4.0.5
#> Warning in system("timedatectl", intern = TRUE): running command 'timedatectl'
#> had status 1
#> Registered S3 method overwritten by 'tune':
#> method from
#> required_pkgs.model_spec parsnip
#> Warning: package 'broom' was built under R version 4.0.5
#> Warning: package 'dials' was built under R version 4.0.5
#> Warning: package 'infer' was built under R version 4.0.5
#> Warning: package 'modeldata' was built under R version 4.0.5
#> Warning: package 'parsnip' was built under R version 4.0.5
#> Warning: package 'recipes' was built under R version 4.0.5
#> Warning: package 'rsample' was built under R version 4.0.5
#> Warning: package 'tibble' was built under R version 4.0.5
#> Warning: package 'tune' was built under R version 4.0.5
#> Warning: package 'workflows' was built under R version 4.0.5
#> Warning: package 'workflowsets' was built under R version 4.0.5
library(finetune)
#> Warning: package 'finetune' was built under R version 4.0.5
data("PimaIndiansDiabetes")
set.seed(24)
df <- PimaIndiansDiabetes %>%
mutate(diabetes = fct_relevel(diabetes, 'pos'))
xgb_rec <- recipe(diabetes ~ ., data = df)
xgb_spec <- boost_tree(
trees = tune()) %>%
set_engine("xgboost", scale_pos_weight = tune()) %>%
set_mode("classification")
resamples_cv <- vfold_cv(df, v = 5)
my_metrics <- metric_set(mn_log_loss, roc_auc, pr_auc)
xgb_wf <- workflow() %>%
add_recipe(xgb_rec) %>%
add_model(xgb_spec)
xgb_rs <- tune_race_anova(
xgb_wf,
resamples = resamples_cv,
grid = 10,
metrics = my_metrics,
control = control_race(verbose_elim = TRUE)
)
#> Error: The workflow has arguments to be tuned that are missing some parameter objects: 'scale_pos_weight'
sessionInfo()
#> R version 4.0.4 (2021-02-15)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: CloudForms
#>
#> Matrix products: default
#> BLAS: /usr/local/lib64/R/lib/libRblas.so
#> LAPACK: /usr/local/lib64/R/lib/libRlapack.so
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] finetune_0.1.0 yardstick_0.0.8 workflowsets_0.1.0 workflows_0.2.4
#> [5] tune_0.1.6 tidyr_1.1.4 tibble_3.1.5 rsample_0.1.1
#> [9] recipes_0.1.17 purrr_0.3.4 parsnip_0.1.7 modeldata_0.1.1
#> [13] infer_1.0.0 ggplot2_3.3.5 dplyr_1.0.7 dials_0.0.10
#> [17] scales_1.1.1 broom_0.7.10 tidymodels_0.1.4 forcats_0.5.1
#> [21] mlbench_2.1-3
#>
#> loaded via a namespace (and not attached):
#> [1] nlme_3.1-152 fs_1.5.0 lubridate_1.7.10 DiceDesign_1.9
#> [5] tools_4.0.4 backports_1.2.1 utf8_1.2.2 R6_2.5.1
#> [9] rpart_4.1-15 DBI_1.1.1 colorspace_2.0-2 nnet_7.3-15
#> [13] withr_2.4.2 tidyselect_1.1.1 compiler_4.0.4 cli_3.1.0
#> [17] stringr_1.4.0 digest_0.6.28 minqa_1.2.4 rmarkdown_2.11
#> [21] pkgconfig_2.0.3 htmltools_0.5.2 parallelly_1.24.0 lme4_1.1-27.1
#> [25] styler_1.4.1 lhs_1.1.1 fastmap_1.1.0 highr_0.9
#> [29] rlang_0.4.12 rstudioapi_0.13 generics_0.1.1 jsonlite_1.7.2
#> [33] magrittr_2.0.1 Matrix_1.3-2 Rcpp_1.0.7 munsell_0.5.0
#> [37] fansi_0.5.0 GPfit_1.0-8 lifecycle_1.0.1 furrr_0.2.2
#> [41] stringi_1.7.5 pROC_1.17.0.1 yaml_2.2.1 MASS_7.3-53
#> [45] plyr_1.8.6 grid_4.0.4 parallel_4.0.4 listenv_0.8.0
#> [49] crayon_1.4.2 lattice_0.20-41 splines_4.0.4 knitr_1.36
#> [53] pillar_1.6.3 boot_1.3-26 xgboost_1.4.1.1 codetools_0.2-18
#> [57] reprex_2.0.0 glue_1.5.1 evaluate_0.14 data.table_1.14.0
#> [61] nloptr_1.2.2.2 vctrs_0.3.8 foreach_1.5.1 gtable_0.3.0
#> [65] future_1.21.0 assertthat_0.2.1 xfun_0.28 gower_0.2.2
#> [69] prodlim_2019.11.13 class_7.3-18 survival_3.2-7 timeDate_3043.102
#> [73] iterators_1.0.13 hardhat_0.1.6 lava_1.6.9 globals_0.14.0
#> [77] ellipsis_0.3.2 ipred_0.9-12