Hey Matt,
That should not be the case.
When passing the data to the model, only the predictors and outcomes should be exposed to the modeling function. Please file an issue as soon as you can if this is not the case since I will be sending a new recipes (and hardhat) to CRAN very soon.
For your example:
library(tidymodels)
library(hardhat) # I have hardhat_1.1.0 from CRAN
tidymodels_prefer()
data(biomass)
# However `sample` and `dataset` aren't predictors. Since they already have
# roles, `update_role()` can be used to make changes, to any arbitrary role:
rec <-
recipe(HHV ~ ., data = biomass) %>%
update_role(sample, new_role = "id variable") %>%
update_role(dataset, new_role = "splitting variable")
summary(rec)
#> # A tibble: 8 × 4
#> variable type role source
#> <chr> <chr> <chr> <chr>
#> 1 sample nominal id variable original
#> 2 dataset nominal splitting variable original
#> 3 carbon numeric predictor original
#> 4 hydrogen numeric predictor original
#> 5 oxygen numeric predictor original
#> 6 nitrogen numeric predictor original
#> 7 sulfur numeric predictor original
#> 8 HHV numeric outcome original
wflow <-
workflow() %>%
add_recipe(rec) %>%
add_model(linear_reg())
wflow_fit <- fit(wflow, data = biomass)
# it shopuld only get the predictors and outcomes
wflow_fit %>% extract_fit_engine() %>% coef() %>% names()
#> [1] "(Intercept)" "carbon" "hydrogen" "oxygen" "nitrogen"
#> [6] "sulfur"
Created on 2022-06-27 by the reprex package (v2.0.1)
The point of the non-standard roles: you can keep certain columns around in your data without them being used in the model. After you fit the model, you might want these around to troubleshoot poor predictions, make plots, or anything else. That is still the goal.
What changed in hardhat 1.1.0?
Just to recap what is currently happening. The main change to hardhat was related to our addition of case weight tools across the tidymodels packages.
With case weights, we needed a method to determine what columns need to be available when bake()
is used.
With the case weight change, we needed to address non-standard roles (e.g. not predictor or outcome). Our first attempt resulted in a number of breakages (which you thankfully reported).
We have a better solution in PR that is easier for users and will break fewer existing recipes and packages.
In the imminent versions of hardhat and recipes, a new recipes function will let you say what is required at bake time and what is not. It puts all of the choice into the recipe object and the workflow and hardhat objects are mostly agnostic to these choices.
We're doing the most extensive reverse dependency checking that we can for these releases.