How to specify a dummy model/heuristic rule in tidymodels?

gordon_m · March 20, 2024, 9:57am

I'm comparing a few ML models on my dataset using tidymodels and workflowsets, and I want to compare them to a commonly used heuristic rule in the domain as well at the same time. I thought it might be simple to specify either the rule e.g. y_pred = (x1 > 3)|(x2 <1) as a model on the same data, tune nothing (as it won't change) and then compare easily to all the other models as it's just a poorly fit model, using yardstick etc. I cannot for the life of me figure out what is the right way to specify it cleanly at the start, the same as the models that actually get fit.

simoncouch · March 20, 2024, 12:37pm

Duplicate post on StackOverflow. Pasting the answer here as well!

The community-contrubuted parsnip extension package bespoke allows folks to define these sorts of models. Install with:

pak::pak("macmillancontentscience/bespoke")

The main function, bespoke(), takes a data frame as input and returns a vector (integer, character, or factor) indicating the outcomes as output (with one value per input row). A quick example of how that might look in action:

library(parsnip)
library(bespoke)

dat <- data.frame(
  y = factor(sample(c("a", "b"), 10, replace = TRUE)), 
  x1 = rnorm(10), 
  x2 = rnorm(10, .5)
)

make_pred <- function(x) {
  y_pred <- x$x1 > x$x2
  factor(y_pred, labels = c("a", "b"))
}

model_spec <- bespoke(fn = make_pred)

model_spec
#> bespoke Model Specification (classification)
#> 
#> Main Arguments:
#>   fn = make_pred
#> 
#> Computational engine: bespoke

model_fit <- model_spec %>% fit(y ~ x1 + x2, dat)

predict(model_fit, dat)
#> # A tibble: 10 × 1
#>    .pred_class
#>    <fct>      
#>  1 b          
#>  2 b          
#>  3 b          
#>  4 a          
#>  5 a          
#>  6 b          
#>  7 a          
#>  8 b          
#>  9 a          
#> 10 b

^{Created on 2024-03-20 with reprex v2.1.0}

system · April 10, 2024, 12:37pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.