Cost-sensitive learning using tidymodels

Enuma · September 21, 2022, 8:39pm

Hi!
I would like to specify custom cost matrix for miss/clasification for binary/multiclass problems (using rf, xgboost), but I am having hard time finding a good example with tidymodels.

I found this SO thread.

But the example code fails with Error(s) x5: Error invalue[3L]: ! In metric: classification_cost_penalized object 'cost_matrix' not found... The cost_matrix tibble is of course in my environment.

What is the current best approach to tackle this with tidymodels?
Thanks!

The example code from SO:

library(tidymodels)

# load data
data("two_class_example")
data("two_class_dat")

cost_matrix <- tribble(
  ~truth, ~estimate, ~cost,
  "Class1", "Class2",  2,
  "Class2", "Class1",  1
)

classification_cost_penalized <- metric_tweak(
  .name = "classification_cost_penalized",
  .fn = classification_cost,
  costs = cost_matrix
)

# test if this works on the simulated estimates
two_class_example %>% 
  classification_cost_penalized(truth = truth, class_prob = Class1)
#> # A tibble: 1 × 3
#>   .metric                       .estimator .estimate
#>   <chr>                         <chr>          <dbl>
#> 1 classification_cost_penalized binary         0.260

# specify a RF model
my_model <- 
  rand_forest(
    mtry = tune(), 
    min_n = tune(),
    trees = 500
  ) %>% 
  set_engine("ranger") %>% 
  set_mode("classification")

# specify recipe
my_recipe <- recipe(Class ~ A + B, data = two_class_dat)

# bundle to workflow
my_wf <- workflow() %>% 
  add_model(my_model) %>% 
  add_recipe(my_recipe)

# start tuning
tuned_rf <- my_wf %>% 
  tune_grid(
    resamples = vfold_cv(two_class_dat, v = 5),
    grid = 5,
    metrics = metric_set(classification_cost_penalized)
  )

Max · September 22, 2022, 4:34pm

I can't reproduce the issue but do have a suggestion below

library(tidymodels)

# load data
data("two_class_example")
data("two_class_dat")

cost_matrix <- tribble(
  ~truth, ~estimate, ~cost,
  "Class1", "Class2",  2,
  "Class2", "Class1",  1
)

classification_cost_penalized <- metric_tweak(
  .name = "classification_cost_penalized",
  .fn = classification_cost,
  costs = cost_matrix
)

# test if this works on the simulated estimates
two_class_example %>% 
  classification_cost_penalized(truth = truth, class_prob = Class1)
#> # A tibble: 1 × 3
#>   .metric                       .estimator .estimate
#>   <chr>                         <chr>          <dbl>
#> 1 classification_cost_penalized binary         0.260

# specify a RF model
my_model <- 
  rand_forest(
    mtry = tune(), 
    min_n = tune(),
    trees = 500
  ) %>% 
  set_engine("ranger") %>% 
  set_mode("classification")

# specify recipe
my_recipe <- recipe(Class ~ A + B, data = two_class_dat)

# bundle to workflow
my_wf <- workflow() %>% 
  add_model(my_model) %>% 
  add_recipe(my_recipe)

# start tuning
tuned_rf <- my_wf %>% 
  tune_grid(
    resamples = vfold_cv(two_class_dat, v = 5),
    grid = 5,
    metrics = metric_set(classification_cost_penalized)
  )
#> i Creating pre-processing data to finalize unknown parameter: mtry
show_best(tuned_rf)
#> # A tibble: 5 × 8
#>    mtry min_n .metric                       .estim…¹  mean     n std_err .config
#>   <int> <int> <chr>                         <chr>    <dbl> <int>   <dbl> <chr>  
#> 1     2    28 classification_cost_penalized binary   0.390     5  0.0141 Prepro…
#> 2     2    21 classification_cost_penalized binary   0.390     5  0.0144 Prepro…
#> 3     2    33 classification_cost_penalized binary   0.391     5  0.0137 Prepro…
#> 4     1    15 classification_cost_penalized binary   0.399     5  0.0158 Prepro…
#> 5     1     9 classification_cost_penalized binary   0.401     5  0.0165 Prepro…
#> # … with abbreviated variable name ¹.estimator

^{Created on 2022-09-22 with reprex v2.0.2}

I would suggest using the !! operator to pass the cost info tibble (as oppose to a reference to it). Try using:

classification_cost_penalized <- metric_tweak(
  .name = "classification_cost_penalized",
  .fn = classification_cost,
  costs = !!cost_matrix       # <- change here
)

Enuma · September 22, 2022, 7:59pm

I got a different error using this solution. On the flip side, it made me to think about changing the tribble for tibble. And lo behold, it works now. (Win10, R version 4.1.1, tibble: 3.1.8, tune: 1.0.0, tidymodels: 1.0.0).

Now I am only getting a warning More than one set of outcomes were used when tuning. This should never happen. Review how the outcome is specified in your model. This seems unrelated to the original issue.

Thanks Max!

system · September 29, 2022, 7:59pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.