Hi!
I would like to specify custom cost matrix for miss/clasification for binary/multiclass problems (using rf, xgboost), but I am having hard time finding a good example with tidymodels.
But the example code fails with Error(s) x5: Error invalue[3L]: ! In metric: classification_cost_penalized object 'cost_matrix' not found... The cost_matrix tibble is of course in my environment.
What is the current best approach to tackle this with tidymodels?
Thanks!
The example code from SO:
library(tidymodels)
# load data
data("two_class_example")
data("two_class_dat")
cost_matrix <- tribble(
~truth, ~estimate, ~cost,
"Class1", "Class2", 2,
"Class2", "Class1", 1
)
classification_cost_penalized <- metric_tweak(
.name = "classification_cost_penalized",
.fn = classification_cost,
costs = cost_matrix
)
# test if this works on the simulated estimates
two_class_example %>%
classification_cost_penalized(truth = truth, class_prob = Class1)
#> # A tibble: 1 × 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 classification_cost_penalized binary 0.260
# specify a RF model
my_model <-
rand_forest(
mtry = tune(),
min_n = tune(),
trees = 500
) %>%
set_engine("ranger") %>%
set_mode("classification")
# specify recipe
my_recipe <- recipe(Class ~ A + B, data = two_class_dat)
# bundle to workflow
my_wf <- workflow() %>%
add_model(my_model) %>%
add_recipe(my_recipe)
# start tuning
tuned_rf <- my_wf %>%
tune_grid(
resamples = vfold_cv(two_class_dat, v = 5),
grid = 5,
metrics = metric_set(classification_cost_penalized)
)
I got a different error using this solution. On the flip side, it made me to think about changing the tribble for tibble. And lo behold, it works now. (Win10, R version 4.1.1, tibble: 3.1.8, tune: 1.0.0, tidymodels: 1.0.0).
Now I am only getting a warning More than one set of outcomes were used when tuning. This should never happen. Review how the outcome is specified in your model. This seems unrelated to the original issue.