Clarification on data usage when fitting a workflow with a tailor post-processor

I want to check my understanding of how to fit a tuned workflow that contains a tailor post-processor, specifically i want to check that i am implementing this section of the docs correctly:

When fitting a workflow with a postprocessor that requires training (i.e. one that returns TRUE in .workflow_postprocessor_requires_fit(workflow) ), users must pass two data arguments–the usual fit.workflow(data) will be used to train the preprocessor and model while fit.workflow(data_calibration) will be used to train the postprocessor.

I tried using internal_calibration_split() as the docs suggests buy got an error:

cal_split <- rsample::internal_calibration_split(split)
#> Error in split_args$times <- 1: object of type 'symbol' is not subsettable

I then just used initial_split to reserve some data for the calibration step, which worked but i want to make sure this is appropriate to do:

cal_split <- rsample::initial_split(data, prop = 0.8)
model <- rsample::training(cal_split)
calib <- rsample::testing(cal_split)

full_fit <- fit(
  final_wf,
  data = model,
  data_calibration = calib
)

Full reprex:

library(workflows)
library(dplyr)
library(parsnip)
library(rsample)
library(tune)
library(modeldata)
library(probably)
library(tailor)
library(finetune)

data <- sim_classification(2000)

set.seed(1)
split <- initial_split(data)
train <- training(split)
test <- testing(split)

set.seed(1)
folds <- vfold_cv(train)

tlr <-
  tailor() %>%
  adjust_probability_calibration(method = "isotonic") %>%
  adjust_probability_threshold(threshold = tune())

wflow <-
  workflow() %>%
  add_formula(class ~ .) %>%
  add_model(rand_forest(mtry = tune(), mode = "classification", trees = 3)) %>%
  add_tailor(tlr)

set.seed(1)
tune_results <-
  tune_grid(
    wflow,
    folds,
    control = control_resamples(save_pred = TRUE)
  )
#> i Creating pre-processing data to finalize 1 unknown parameter: "mtry"

# evaluate
best <- select_best(tune_results, metric = "accuracy")
final_wf <- finalize_workflow(wflow, best)

# appy to test data
lf <- last_fit(final_wf, split)
test_metrics <- collect_metrics(lf)
test_preds <- collect_predictions(lf)

# ######################################
# HERE IS WHERE MY ISSUES BEGIN
# ######################################

# fit best model on on entire dataset: errors out as expected based on the tailor docs
full_fit <-
  fit(
    final_wf,
    data = data
  )
#> Error in `fit()`:
#> ! The workflow requires `data_calibration` to train but none was
#>   supplied.

# internal_calibration_split() gives an error:
cal_split <- rsample::internal_calibration_split(split)
#> Error in split_args$times <- 1: object of type 'symbol' is not subsettable

# trying a different way works but is it correct???
cal_split <- rsample::initial_split(data, prop = 0.8)
model <- rsample::training(cal_split)
calib <- rsample::testing(cal_split)

full_fit <- fit(
  final_wf,
  data = model,
  data_calibration = calib
)

full_fit
#> ══ Workflow [trained] ══════════════════════════════════════════════════════════
#> Preprocessor: Formula
#> Model: rand_forest()
#> Postprocessor: tailor
#> 
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> class ~ .
#> 
#> ── Model ───────────────────────────────────────────────────────────────────────
#> Ranger result
#> 
#> Call:
#>  ranger::ranger(x = maybe_data_frame(x), y = y, mtry = min_cols(~13L,      x), num.trees = ~3, num.threads = 1, verbose = FALSE, seed = sample.int(10^5,      1), probability = TRUE) 
#> 
#> Type:                             Probability estimation 
#> Number of trees:                  3 
#> Sample size:                      1600 
#> Number of independent variables:  15 
#> Mtry:                             13 
#> Target node size:                 10 
#> Variable importance mode:         none 
#> Splitrule:                        gini 
#> OOB prediction error (Brier s.):  0.1309023 
#> 
#> ── Postprocessor ───────────────────────────────────────────────────────────────
#> 
#> ── tailor ──────────────────────────────────────────────────────────────────────
#> A binary postprocessor with 2 adjustments:
#> 
#> • Re-calibrate classification probabilities using isotonic method.
#> • Adjust probability threshold to 0.222.
#> NA
#> NA
#> NA
Created on 2025-11-05 with reprex v2.1.1

Not to sidestep your question a little but why are you using all of data to fit your final model? We would recommend fitting it on all of your training data so that you have the test set to say how well your model works. You did that with last_fit() already. I would stop there and get two things from your lf object: the performance metrics as evaluated on the testset (your test_metrics) and the model fitted on the training data (via extract_fit_parsnip() on lf). If you refit the workflow on all of data you can't say anymore how well the model works.

What will have happened inside of last_fit() is pretty much what you did manually -- with the important difference that it split up the training data with an initial_split() call.

Thanks! Your points make sense, to clarify why i'd use all data to fit the final model, it is to mimic the last step before deploying the model to production.

tbh i'm not sure if its 100% necessary to do this but i recall learning that once the model hyperparams are finalized and you have measures of performance via cross validation, you fit the final model on all available data.

it wasn't clear in my post but after fitting the model on all data and deploying, I'd only be measuring model performance at this point against new data.

my thinking on this might be wrong and i appreciate your insights, if its sufficient to bundle and deploy after last_fit then that makes things easier.