Predictors dropped with a recipe step

After doing feature selection - specifically, feature dropping - I am wondering if there is a way for a trained workflow to ignore predictors that were dropped from a recipe step.

Such as...


# adding in a predictor that will be dropped
mt2 <-
    mtcars |> 
    mutate(zv = 1)

# build recipe, showing that it is correctly getting dropped
rec <-
    mt2 |> 
    recipe(mpg ~ hp + zv) |> 

#> -- Recipe ----------------------------------------------------------------------
#> -- Inputs
#> Number of variables by role
#> outcome:   1
#> predictor: 2
#> -- Training information
#> Training data contained 32 data points and no incomplete rows.
#> -- Operations
#> * Zero variance filter removed: zv | Trained

# build/fit a workflow
spec <- linear_reg()

wf <- workflow(rec, spec)

wf.fitted <- fit(wf, data = mt2)

# showing that the `zv` predictor was not used in the model fitting
#> Call:
#> stats::lm(formula = ..y ~ ., data = data)
#> Coefficients:
#> (Intercept)           hp  
#>    30.09886     -0.06823

# trying to predict on data that does not have the `zv` column
predict(wf.fitted, new_data = mtcars)
#> Error in `validate_column_names()`:
#> ! The following required columns are missing: 'zv'.

Created on 2024-03-27 with reprex v2.0.2

My questions are:

  1. I'm sure this was an intentional decision and a there is a good reason for it, but am wondering what it is?
  2. Is there any way of telling a trained workflow to ignore variables that are dropped in a recipe step?


It's very hard to retrospectively change the data given as input. That's why R's model formula apparatus doesn't do the same.

You can put in a feature request; we might be able to make an api for that. I added an issue.

Not yet.

For this model (and no other preprocessing), you could use tidypredict to get the prediction equation and use that.

1 Like

Thanks, Max. This is particularly useful in the event that feature selection was done and it is determined that one or more of the dropped predictors isn't worth collecting/storing in the future. Feels like it could help "future proof" a trained workflow. Certainly many ways around this but would be a useful feature nonetheless.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.