I'm comparing a few ML models on my dataset using tidymodels and workflowsets, and I want to compare them to a commonly used heuristic rule in the domain as well at the same time. I thought it might be simple to specify either the rule e.g. y_pred = (x1 > 3)|(x2 <1) as a model on the same data, tune nothing (as it won't change) and then compare easily to all the other models as it's just a poorly fit model, using yardstick etc. I cannot for the life of me figure out what is the right way to specify it cleanly at the start, the same as the models that actually get fit.

Duplicate post on StackOverflow. Pasting the answer here as well!

The community-contrubuted parsnip extension package bespoke allows folks to define these sorts of models. Install with:

```
pak::pak("macmillancontentscience/bespoke")
```

The main function, `bespoke()`

, takes a data frame as input and returns a vector (integer, character, or factor) indicating the outcomes as output (with one value per input row). A quick example of how that might look in action:

```
library(parsnip)
library(bespoke)
dat <- data.frame(
y = factor(sample(c("a", "b"), 10, replace = TRUE)),
x1 = rnorm(10),
x2 = rnorm(10, .5)
)
make_pred <- function(x) {
y_pred <- x$x1 > x$x2
factor(y_pred, labels = c("a", "b"))
}
model_spec <- bespoke(fn = make_pred)
model_spec
#> bespoke Model Specification (classification)
#>
#> Main Arguments:
#> fn = make_pred
#>
#> Computational engine: bespoke
model_fit <- model_spec %>% fit(y ~ x1 + x2, dat)
predict(model_fit, dat)
#> # A tibble: 10 × 1
#> .pred_class
#> <fct>
#> 1 b
#> 2 b
#> 3 b
#> 4 a
#> 5 a
#> 6 b
#> 7 a
#> 8 b
#> 9 a
#> 10 b
```

^{Created on 2024-03-20 with reprex v2.1.0}

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.