I want to check my understanding of how to fit a tuned workflow that contains a tailor post-processor, specifically i want to check that i am implementing this section of the docs correctly:
When fitting a workflow with a postprocessor that requires training (i.e. one that returns
TRUEin.workflow_postprocessor_requires_fit(workflow)), users must pass two data arguments–the usualfit.workflow(data)will be used to train the preprocessor and model whilefit.workflow(data_calibration)will be used to train the postprocessor.
I tried using internal_calibration_split() as the docs suggests buy got an error:
cal_split <- rsample::internal_calibration_split(split)
#> Error in split_args$times <- 1: object of type 'symbol' is not subsettable
I then just used initial_split to reserve some data for the calibration step, which worked but i want to make sure this is appropriate to do:
cal_split <- rsample::initial_split(data, prop = 0.8)
model <- rsample::training(cal_split)
calib <- rsample::testing(cal_split)
full_fit <- fit(
final_wf,
data = model,
data_calibration = calib
)
Full reprex:
library(workflows)
library(dplyr)
library(parsnip)
library(rsample)
library(tune)
library(modeldata)
library(probably)
library(tailor)
library(finetune)
data <- sim_classification(2000)
set.seed(1)
split <- initial_split(data)
train <- training(split)
test <- testing(split)
set.seed(1)
folds <- vfold_cv(train)
tlr <-
tailor() %>%
adjust_probability_calibration(method = "isotonic") %>%
adjust_probability_threshold(threshold = tune())
wflow <-
workflow() %>%
add_formula(class ~ .) %>%
add_model(rand_forest(mtry = tune(), mode = "classification", trees = 3)) %>%
add_tailor(tlr)
set.seed(1)
tune_results <-
tune_grid(
wflow,
folds,
control = control_resamples(save_pred = TRUE)
)
#> i Creating pre-processing data to finalize 1 unknown parameter: "mtry"
# evaluate
best <- select_best(tune_results, metric = "accuracy")
final_wf <- finalize_workflow(wflow, best)
# appy to test data
lf <- last_fit(final_wf, split)
test_metrics <- collect_metrics(lf)
test_preds <- collect_predictions(lf)
# ######################################
# HERE IS WHERE MY ISSUES BEGIN
# ######################################
# fit best model on on entire dataset: errors out as expected based on the tailor docs
full_fit <-
fit(
final_wf,
data = data
)
#> Error in `fit()`:
#> ! The workflow requires `data_calibration` to train but none was
#> supplied.
# internal_calibration_split() gives an error:
cal_split <- rsample::internal_calibration_split(split)
#> Error in split_args$times <- 1: object of type 'symbol' is not subsettable
# trying a different way works but is it correct???
cal_split <- rsample::initial_split(data, prop = 0.8)
model <- rsample::training(cal_split)
calib <- rsample::testing(cal_split)
full_fit <- fit(
final_wf,
data = model,
data_calibration = calib
)
full_fit
#> ══ Workflow [trained] ══════════════════════════════════════════════════════════
#> Preprocessor: Formula
#> Model: rand_forest()
#> Postprocessor: tailor
#>
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> class ~ .
#>
#> ── Model ───────────────────────────────────────────────────────────────────────
#> Ranger result
#>
#> Call:
#> ranger::ranger(x = maybe_data_frame(x), y = y, mtry = min_cols(~13L, x), num.trees = ~3, num.threads = 1, verbose = FALSE, seed = sample.int(10^5, 1), probability = TRUE)
#>
#> Type: Probability estimation
#> Number of trees: 3
#> Sample size: 1600
#> Number of independent variables: 15
#> Mtry: 13
#> Target node size: 10
#> Variable importance mode: none
#> Splitrule: gini
#> OOB prediction error (Brier s.): 0.1309023
#>
#> ── Postprocessor ───────────────────────────────────────────────────────────────
#>
#> ── tailor ──────────────────────────────────────────────────────────────────────
#> A binary postprocessor with 2 adjustments:
#>
#> • Re-calibrate classification probabilities using isotonic method.
#> • Adjust probability threshold to 0.222.
#> NA
#> NA
#> NA
Created on 2025-11-05 with reprex v2.1.1