Is there a function that returns the list/vector/etc. of variables that "survive" the step functions? I have a simple reproducible example below. I'd like to be able to get a list of variables that are left at the end - in this case, only x2. This seems somewhat related to this post.
library(tidymodels)
set.seed(123)
samp_size <- 1000
# Creating sample data where x2 is highly correlated with x1 and x3 has near-zero variance
sample_data <- tibble(x = rnorm(samp_size)) %>%
mutate(
y = 3 + 2*x + rnorm(samp_size, 0, .5),
x2 = x + rnorm(samp_size, 0,.1),
x3 = c(rep(0, samp_size - 1), 1)
)
# Didn't break out training and testing since it's not needed for this simple example.
simple_recipe <- recipe(y ~ ., sample_data) %>%
step_nzv(all_numeric_predictors()) %>%
step_corr(all_numeric_predictors())
# When I run the steps, I'm only left with y and x2
simple_recipe %>%
prep() %>%
juice()
I'll watch your development closely . I knew the tidymodels team would be on top of this! And, just to give a use case, I'm often asked how the predictor variables differ across certain groups. If I have maybe 500 variables to start but only 300 make it past the step_XXX() phase, it would be nice only to report on the 300, rather than 500.
I appreciate you answering the question so quickly.