After doing feature selection - specifically, feature dropping - I am wondering if there is a way for a trained workflow to ignore predictors that were dropped from a recipe step.
Such as...
library(tidymodels)
# adding in a predictor that will be dropped
mt2 <-
mtcars |>
mutate(zv = 1)
# build recipe, showing that it is correctly getting dropped
rec <-
mt2 |>
recipe(mpg ~ hp + zv) |>
step_zv(all_predictors())
prep(rec)
#>
#> -- Recipe ----------------------------------------------------------------------
#>
#> -- Inputs
#> Number of variables by role
#> outcome: 1
#> predictor: 2
#>
#> -- Training information
#> Training data contained 32 data points and no incomplete rows.
#>
#> -- Operations
#> * Zero variance filter removed: zv | Trained
# build/fit a workflow
spec <- linear_reg()
wf <- workflow(rec, spec)
wf.fitted <- fit(wf, data = mt2)
# showing that the `zv` predictor was not used in the model fitting
extract_fit_engine(wf.fitted)
#>
#> Call:
#> stats::lm(formula = ..y ~ ., data = data)
#>
#> Coefficients:
#> (Intercept) hp
#> 30.09886 -0.06823
# trying to predict on data that does not have the `zv` column
predict(wf.fitted, new_data = mtcars)
#> Error in `validate_column_names()`:
#> ! The following required columns are missing: 'zv'.
Created on 2024-03-27 with reprex v2.0.2
My questions are:
- I'm sure this was an intentional decision and a there is a good reason for it, but am wondering what it is?
- Is there any way of telling a trained workflow to ignore variables that are dropped in a recipe step?
Thanks!