Not able to evaluate models with tidymodels

boringstuff · January 6, 2021, 4:10pm

Hi,

I'm following a classification case example (Chapter 7) from Supervised Machine Learning by EMIL HVITFELDT AND JULIA SILGE. The dataset can be downloaded from here finance complaints.

So my question is I installed the required packages and trying to reproduce the result, everything works fine until fit the training data back to the workflow will return "Warning message: naive_bayes(): y has less than two classes. "

library(discrim)
nb_spec <- naive_Bayes() %>%
  set_mode("classification") %>%
  set_engine("naivebayes")
nb_spec

nb_fit <- complaint_wf %>%
  add_model(nb_spec) %>%
  fit(data = complaints_train)

set.seed(234)
complaints_folds <- vfold_cv(complaints_train)

complaints_folds

nb_wf <- workflow() %>%
  add_recipe(complaints_rec) %>%
  add_model(nb_spec)

nb_wf

nb_rs <- fit_resamples(
  nb_wf,
  complaints_folds,
  control = control_resamples(save_pred = TRUE)
)

After folding process, the model is not able to evaluate and return
"! Fold01: preprocessor 1/1, model 1/1: naive_bayes(): y has less than two classes.
x Fold01: internal: Error: In metric: accuracy
Problem with summarise() input .estimate.
x `estima..."

I have no idea where went wrong...

Thanks.

My R sessionInfo.

R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] stopwords_2.1     naivebayes_0.9.7  vctrs_0.3.6       rlang_0.4.10      discrim_0.1.1     textrecipes_0.4.0
 [7] yardstick_0.0.7   workflows_0.2.1   tune_0.1.2        rsample_0.0.8     recipes_0.1.15    parsnip_0.1.4    
[13] modeldata_0.1.0   infer_0.5.3       dials_0.0.9       scales_1.1.1      broom_0.7.3       tidymodels_0.1.2 
[19] forcats_0.5.0     stringr_1.4.0     dplyr_1.0.2       purrr_0.3.4       readr_1.4.0       tidyr_1.1.2      
[25] tibble_3.0.4      ggplot2_3.3.3     tidyverse_1.3.0  

loaded via a namespace (and not attached):
 [1] fs_1.5.0           usethis_2.0.0      lubridate_1.7.9.2  DiceDesign_1.8-1   httr_1.4.2        
 [6] SnowballC_0.7.0    tools_4.0.3        backports_1.2.1    utf8_1.1.4         R6_2.5.0          
[11] rpart_4.1-15       DBI_1.1.0          colorspace_2.0-0   nnet_7.3-14        withr_2.3.0       
[16] tidyselect_1.1.0   compiler_4.0.3     cli_2.2.0          rvest_0.3.6        xml2_1.3.2        
[21] digest_0.6.27      pkgconfig_2.0.3    parallelly_1.23.0  lhs_1.1.1          dbplyr_2.0.0      
[26] readxl_1.3.1       rstudioapi_0.13    generics_0.1.0     jsonlite_1.7.2     tokenizers_0.2.1  
[31] magrittr_2.0.1     Matrix_1.3-0       Rcpp_1.0.5         munsell_0.5.0      fansi_0.4.1       
[36] GPfit_1.0-8        lifecycle_0.2.0    furrr_0.2.1        stringi_1.5.3      pROC_1.16.2       
[41] MASS_7.3-53        plyr_1.8.6         grid_4.0.3         parallel_4.0.3     listenv_0.8.0     
[46] crayon_1.3.4       lattice_0.20-41    haven_2.3.1        splines_4.0.3      hms_0.5.3         
[51] pillar_1.4.7       codetools_0.2-18   reprex_0.3.0       glue_1.4.2         modelr_0.1.8      
[56] foreach_1.5.1      cellranger_1.1.0   gtable_0.3.0       future_1.21.0      assertthat_0.2.1  
[61] gower_0.2.2        prodlim_2019.11.13 class_7.3-17       survival_3.2-7     timeDate_3043.102 
[66] iterators_1.0.13   hardhat_0.1.5      lava_1.6.8.1       globals_0.14.0     ellipsis_0.3.1    
[71] ipred_0.9-9

Max · January 7, 2021, 4:26pm

This model requires that all the levels of the outcome be available in each unique level of the categorical predictors. For example, if predictor "A" only has a single class, the model will fail.

Can you screen the data to see if that's the case? I would but there isn't a reprepx since I can't see the code that made complaints_train.

boringstuff · January 8, 2021, 1:43am

I think I kind of got it now, indeed after cleaning the data there is only one class. I will try with a different dataset, will get back to you. Thanks!

system · January 29, 2021, 1:43am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.