I was following the guide here https://www.tidymodels.org/start/tuning/ on tuning models using tidymodels but wanted to try it using a bagged tree model. However, when I try and tune the bagged tree model I get the warning message:
Warning message: All models failed in tune_grid(). See the
.notes column.
In the notes column each entry is:
"internal: Error in rlang::env_get(mod_env, items): argument \"default\" is missing, with no default"
The only thing different about my code from the guide is the model type and the model will fit if I specify the parameters directly, but I am unable to tune the model and I'm not sure why. Nor can I find any posts of others having a similar problem with baguette or rpart using tune_grid.
library(baguette)
bag_spec <-
bag_tree(tree_depth = tune()) %>%
set_mode("regression") %>%
set_engine("rpart", times = 25)
bag_grid <- grid_regular(
tree_depth(),
levels = 10
)
bag_wf <- workflow() %>%
add_formula(QUAL_SCORE_y0 ~ .) %>%
add_model(bag_spec)
vb_folds <- vfold_cv(df_training)
doParallel::registerDoParallel()
bag_res <- tune_grid(
bag_wf,
resamples = vb_folds,
grid = bag_grid
)
1 Like
Max
October 5, 2020, 9:06pm
2
There is currently a bug that is halfway squashed. It is related to using PSOCK clusters (e.g. doParallel::registerDoParallel()
).
What OS are you on?
1 Like
OS is Windows 10, I'll try the same code without the doParallel call and see if it works
1 Like
Max
October 5, 2020, 10:47pm
4
Can you run remotes::install_dev("baguette")
and see if it works then?
So I tried this and took the doParallel::registerDoParallel()
call off of my code and encountered a new error.
When I try and fit one bagged tree using:
bag_spec <-
bag_tree(tree_depth = 5) %>%
set_mode("regression") %>%
set_engine("rpart", times = 25) %>%
fit(QUAL_SCORE_y0 ~ ., data = df_training)
tidymodels fits the model without error, but when I try and use the above code for any parameter tuning I get:
x Fold01: model 2/25: Error: All of the models failed. An example message was:
Error in [.data.frame
(m, labs) : undefined columns selected
I have tried the solution found here to no avail: classification - R caret rpart returns Error in `[.data.frame`(m, labs) : undefined columns selected - Stack Overflow
Max
October 6, 2020, 8:17pm
6
It's hard to know if this is a code issue or a package issue. Can you run this reprex?
library(tidymodels)
library(baguette)
library(doParallel)
registerDoParallel()
bagged <-
bag_tree(cost_complexity = tune()) %>%
set_engine("rpart", times = 5) %>%
set_mode("regression")
set.seed(1)
folds <- vfold_cv(mtcars)
set.seed(2)
tuned <-
bagged %>%
tune_grid(mpg ~ ., folds, grid = 3)
tuned
Max:
library(tidymodels)
library(baguette)
library(doParallel)
registerDoParallel()
bagged <-
bag_tree(cost_complexity = tune()) %>%
set_engine("rpart", times = 5) %>%
set_mode("regression")
set.seed(1)
folds <- vfold_cv(mtcars)
set.seed(2)
tuned <-
bagged %>%
tune_grid(mpg ~ ., folds, grid = 3)
tuned
registerDoParallel()
bagged <-
+ bag_tree(cost_complexity = tune()) %>%
+ set_engine("rpart", times = 5) %>%
+ set_mode("regression")
set.seed(1)
folds <- vfold_cv(mtcars)
set.seed(2)
tuned <-
+ bagged %>%
+ tune_grid(mpg ~ ., folds, grid = 3)
Warning message:
All models failed in tune_grid(). See the `.notes` column.
tuned
# Tuning results
# 10-fold cross-validation
# A tibble: 10 x 4
splits id .metrics .notes
<list> <chr> <list> <list>
1 <split [28/4]> Fold01 <NULL> <tibble [1 x 1]>
2 <split [28/4]> Fold02 <NULL> <tibble [1 x 1]>
3 <split [29/3]> Fold03 <NULL> <tibble [1 x 1]>
4 <split [29/3]> Fold04 <NULL> <tibble [1 x 1]>
5 <split [29/3]> Fold05 <NULL> <tibble [1 x 1]>
6 <split [29/3]> Fold06 <NULL> <tibble [1 x 1]>
7 <split [29/3]> Fold07 <NULL> <tibble [1 x 1]>
8 <split [29/3]> Fold08 <NULL> <tibble [1 x 1]>
9 <split [29/3]> Fold09 <NULL> <tibble [1 x 1]>
10 <split [29/3]> Fold10 <NULL> <tibble [1 x 1]>
Warning message:
This tuning result has notes. Example notes on model fitting include:
internal: Error in rlang::env_get(mod_env, items): argument "default" is missing, with no default
internal: Error in rlang::env_get(mod_env, items): argument "default" is missing, with no default
internal: Error in rlang::env_get(mod_env, items): argument "default" is missing, with no default
Max
October 6, 2020, 8:22pm
8
Ok. How about, after loading all of the packages, run sessioninfo::session_info()
?
Sure no problem:
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] doParallel_1.0.15 iterators_1.0.12 foreach_1.5.0 baguette_0.0.1.9000 yardstick_0.0.7 workflows_0.2.0 tune_0.1.1 tidyr_1.1.2 tibble_3.0.3 rsample_0.0.8 recipes_0.1.13 purrr_0.3.4 parsnip_0.1.3
[14] modeldata_0.0.2 infer_0.5.3 ggplot2_3.3.2 dplyr_1.0.2 dials_0.0.9 scales_1.1.1 broom_0.7.0 tidymodels_0.1.1
loaded via a namespace (and not attached):
[1] splines_4.0.2 prodlim_2019.11.13 Formula_1.2-3 assertthat_0.2.1 GPfit_1.0-8 globals_0.13.0 ipred_0.9-9 pillar_1.4.6 backports_1.1.10 lattice_0.20-41 glue_1.4.2 pROC_1.16.2 digest_0.6.25 hardhat_0.1.4
[15] colorspace_1.4-1 Matrix_1.2-18 plyr_1.8.6 timeDate_3043.102 pkgconfig_2.0.3 lhs_1.1.0 DiceDesign_1.8-1 earth_5.2.0 listenv_0.8.0 mvtnorm_1.1-1 gower_0.2.2 lava_1.6.8 Cubist_0.2.3 TeachingDemos_2.12
[29] generics_0.0.2 ellipsis_0.3.1 withr_2.3.0 furrr_0.1.0 nnet_7.3-14 cli_2.0.2 survival_3.1-12 magrittr_1.5 crayon_1.3.4 future_1.19.1 fansi_0.4.1 MASS_7.3-51.6 class_7.3-17 tools_4.0.2
[43] lifecycle_0.2.0 stringr_1.4.0 munsell_0.5.0 plotrix_3.7-8 compiler_4.0.2 inum_1.0-1 rlang_0.4.7 plotmo_3.6.0 grid_4.0.2 rstudioapi_0.11 C50_0.1.3.1 partykit_1.2-9 gtable_0.3.0 codetools_0.2-16
[57] reshape2_1.4.4 R6_2.4.1 lubridate_1.7.9 libcoin_1.0-6 stringi_1.5.3 Rcpp_1.0.5 vctrs_0.3.4 rpart_4.1-15 tidyselect_1.1.0
Max
October 6, 2020, 8:25pm
10
This gives a lot more information:
Sorry about that. Here it is:
- Session info -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
setting value
version R version 4.0.2 (2020-06-22)
os Windows 10 x64
system x86_64, mingw32
ui RStudio
language (EN)
collate English_United States.1252
ctype English_United States.1252
tz America/Los_Angeles
date 2020-10-06
- Packages -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
package * version date lib source
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.2)
backports 1.1.10 2020-09-15 [1] CRAN (R 4.0.2)
baguette * 0.0.1.9000 2020-10-06 [1] Github (tidymodels/baguette@25ad7af)
broom * 0.7.0 2020-07-09 [1] CRAN (R 4.0.2)
C50 0.1.3.1 2020-05-26 [1] CRAN (R 4.0.2)
class 7.3-17 2020-04-26 [2] CRAN (R 4.0.2)
cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.2)
codetools 0.2-16 2018-12-24 [2] CRAN (R 4.0.2)
colorspace 1.4-1 2019-03-18 [1] CRAN (R 4.0.2)
crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.2)
Cubist 0.2.3 2020-01-10 [1] CRAN (R 4.0.2)
dials * 0.0.9 2020-09-16 [1] CRAN (R 4.0.2)
DiceDesign 1.8-1 2019-07-31 [1] CRAN (R 4.0.2)
digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.2)
doParallel * 1.0.15 2019-08-02 [1] CRAN (R 4.0.2)
dplyr * 1.0.2 2020-08-18 [1] CRAN (R 4.0.2)
earth 5.2.0 2020-09-16 [1] CRAN (R 4.0.2)
ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.2)
fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.2)
foreach * 1.5.0 2020-03-30 [1] CRAN (R 4.0.2)
Formula 1.2-3 2018-05-03 [1] CRAN (R 4.0.0)
furrr 0.1.0 2018-05-16 [1] CRAN (R 4.0.2)
future 1.19.1 2020-09-22 [1] CRAN (R 4.0.2)
generics 0.0.2 2018-11-29 [1] CRAN (R 4.0.2)
ggplot2 * 3.3.2 2020-06-19 [1] CRAN (R 4.0.2)
globals 0.13.0 2020-09-17 [1] CRAN (R 4.0.2)
glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
gower 0.2.2 2020-06-23 [1] CRAN (R 4.0.2)
GPfit 1.0-8 2019-02-08 [1] CRAN (R 4.0.2)
gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.2)
hardhat 0.1.4 2020-07-02 [1] CRAN (R 4.0.2)
infer * 0.5.3 2020-07-14 [1] CRAN (R 4.0.2)
inum 1.0-1 2019-04-25 [1] CRAN (R 4.0.2)
ipred 0.9-9 2019-04-28 [1] CRAN (R 4.0.2)
iterators * 1.0.12 2019-07-26 [1] CRAN (R 4.0.2)
lattice 0.20-41 2020-04-02 [2] CRAN (R 4.0.2)
lava 1.6.8 2020-09-26 [1] CRAN (R 4.0.2)
lhs 1.1.0 2020-09-29 [1] CRAN (R 4.0.2)
libcoin 1.0-6 2020-08-14 [1] CRAN (R 4.0.2)
lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.2)
listenv 0.8.0 2019-12-05 [1] CRAN (R 4.0.2)
lubridate 1.7.9 2020-06-08 [1] CRAN (R 4.0.2)
magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.2)
MASS 7.3-51.6 2020-04-26 [2] CRAN (R 4.0.2)
Matrix 1.2-18 2019-11-27 [2] CRAN (R 4.0.2)
modeldata * 0.0.2 2020-06-22 [1] CRAN (R 4.0.2)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.2)
mvtnorm 1.1-1 2020-06-09 [1] CRAN (R 4.0.0)
nnet 7.3-14 2020-04-26 [2] CRAN (R 4.0.2)
parsnip * 0.1.3 2020-08-04 [1] CRAN (R 4.0.2)
partykit 1.2-9 2020-07-10 [1] CRAN (R 4.0.2)
pillar 1.4.6 2020-07-10 [1] CRAN (R 4.0.2)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.2)
plotmo 3.6.0 2020-09-13 [1] CRAN (R 4.0.2)
plotrix 3.7-8 2020-04-16 [1] CRAN (R 4.0.0)
plyr 1.8.6 2020-03-03 [1] CRAN (R 4.0.2)
pROC 1.16.2 2020-03-19 [1] CRAN (R 4.0.2)
prodlim 2019.11.13 2019-11-17 [1] CRAN (R 4.0.2)
purrr * 0.3.4 2020-04-17 [1] CRAN (R 4.0.2)
R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.2)
Rcpp 1.0.5 2020-07-06 [1] CRAN (R 4.0.2)
recipes * 0.1.13 2020-06-23 [1] CRAN (R 4.0.2)
reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.0.2)
rlang 0.4.7 2020-07-09 [1] CRAN (R 4.0.2)
rpart 4.1-15 2019-04-12 [1] CRAN (R 4.0.2)
rsample * 0.0.8 2020-09-23 [1] CRAN (R 4.0.2)
rstudioapi 0.11 2020-02-07 [1] CRAN (R 4.0.2)
scales * 1.1.1 2020-05-11 [1] CRAN (R 4.0.2)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2)
stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2)
stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.2)
survival 3.1-12 2020-04-10 [2] CRAN (R 4.0.2)
TeachingDemos 2.12 2020-04-07 [1] CRAN (R 4.0.2)
tibble * 3.0.3 2020-07-10 [1] CRAN (R 4.0.2)
tidymodels * 0.1.1 2020-07-14 [1] CRAN (R 4.0.2)
tidyr * 1.1.2 2020-08-27 [1] CRAN (R 4.0.2)
tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.2)
timeDate 3043.102 2018-02-21 [1] CRAN (R 4.0.2)
tune * 0.1.1 2020-07-08 [1] CRAN (R 4.0.2)
vctrs 0.3.4 2020-08-29 [1] CRAN (R 4.0.2)
withr 2.3.0 2020-09-22 [1] CRAN (R 4.0.2)
workflows * 0.2.0 2020-09-15 [1] CRAN (R 4.0.2)
yardstick * 0.0.7 2020-07-13 [1] CRAN (R 4.0.2)
Max
October 6, 2020, 8:28pm
12
Thanks! How about using remotes::install_dev("tune")
and try again.
The reprex you provided runs just fine now, but on my dataframe I still get the following error:
x Fold01: model 1/25: Error: All of the models failed. An example message was:
Error in `[.data.frame`(m, labs) : undefined columns selected
It seems to have a problem with my column names but I can't think why that would be the case and only happen during tune, not when a single bag model is trained. Here are the column names:
[1] "HQ" "Facility_Type" "QUAL_SCORE_y0" "QUAL_SCORE_y1" "QUAL_SCORE_y2" "QUAL_SCORE_y3" "ERV_y0" "ERV_y1" "ERV_y2" "ERV_y3" "Rating_Method_y0" "Rating_Method_y1"
[13] "Rating_Method_y2" "Rating_Method_y3" "SUST_ACF" "SUST_RQMT" "PRORATED_SERVICE_PRV" "FSM_CALC_PRV" "Facility_Age_yrs" "sc_chg_1" "sc_chg_2" "all_score" "EVR_chg_0" "EVR_chg_1"
[25] "EVR_chg_2" "all_erv"
I am getting exactly the same error. Earlier on a custom model was working perfectly fine, now I am suddenly getting this error. I haven't created a reprex. My first thought was to search online for the error and discovered that this is a current ongoing conversation (ie. within the last few minutes).
I don't believe any changes I made to my code should have caused the error. And I don't think I updated tune or any of the tidymodel packages since it was working, but it is possible that I did. I am running on Linux not Windows and using the github version of tidymodels.
Max
October 6, 2020, 8:50pm
15
I don't see anything wrong with those names. Does it run sequentially?
The code doesn't run sequentially or in parallel, but the model will run when not tuning, i.e. if I specify tree_depth and then pipe to fit rather than tuning.
Here is an example of the errors from the .notes column:
Max
October 6, 2020, 9:06pm
17
I don't know that I can help more without being able to run an example locally that has the same issue.
GraemeS
October 6, 2020, 11:40pm
18
I think the problem is that allow_sparse_x is now a required parameter in the options of set_encoding when defining a custom model, and doesn't have a default value. It is late at night, so I aren't going to investigate this further, but I think this may be the cause of the error, in my case anyway.
It's a frustrating error because it only occurs when using tune and not when fitting a single model. I suppose I could make my own wrapper to search the hyperparameters but that defeats one of the main reasons for switching to tidymodels.
I even tried
library(janitor)
df <- clean_names(df)
as well as stripping out "_" and all numbers from column names but nothing works. There appears to be something wrong with how tune and rpart are interacting with column names but I have not found a solution so far.
FWIW I am seeing the same original error (rlang::env_get......
) when trying to run naive_Bayes()
model in parallel. Runs without the error sequential. (Ubuntu)