I'm using recipes with TidyModels and can't seem to get the process steps to reliably output data as required. As you can see from the reprex, bake with new_data as null should return a processed set of training data when retain = T (I believe). However, the bake data is not being centered or scaled, correlation thresholds do not reduce predictors, pca does not output, etc. My objective is to verify the outputs are working properly before training. Any ideas what I'm doing wrong?
library(tidymodels)
library(textrecipes)
library(themis)
library(embed)
t <- "Target"
Target <- as.factor(sample(c("A", "B"), 100, replace = TRUE))
Other <- as.factor(sample(c("AA", "BB", "CCC", "DDD"), 100, replace = TRUE))
Numb1 <- sample(1:100, 100, replace = TRUE)
Numb2 <- sample(1:100, 100, replace = TRUE)
df <- tibble(Target, Other, Numb1, Numb2)
rec <- recipe(as.formula(glue("{t} ~.")), data = df) %>%
step_zv(all_predictors(), skip = F) %>%
step_impute_knn(all_predictors(), skip = F) %>%
step_dummy(all_nominal_predictors(), one_hot = T, skip = F) %>%
step_clean_names(all_predictors(), skip = F) %>%
step_center(all_numeric_predictors(), skip = F) %>%
step_scale(all_numeric_predictors(), skip = F) %>%
step_corr(all_numeric_predictors(), threshold = 0, method = "pearson", skip = F) %>%
step_upsample(t, skip = T) %>%
step_pca_truncated(all_numeric_predictors(), num_comp = 5, skip = F)
prep <- prep(rec, training = df, retain = T)
bake(prep, new_data = NULL)
good article on sequencing steps: Ordering of steps • recipes which I checked against this flow