tidymodels Recipe Sequence

Based on additional research, here's what I came up with as a generic starting template (noting that not every problem would use every step, and not necessarily in this order):

  1. Define custom roles for variables: update_role()
  2. Preliminary feature selection/removal
    a. manually select/remove: step_select() and/or step_rm()
    b. remove zero variance/near-zero variance predictors: step_zv() and/or step_nzv()
    c. remove highly-correlated features: step_corr()
  3. Observation removal/filtering and imputation
    a. remove rows with missing values: step_naomit()
    b. remove observations with extreme outliers: ???
    c. Impute missing values (various methods): step_impute_*()
  4. Quantitative variable transformations
    a. Transform for skewness or other issues: step_log(), step_sqrt(), step_boxcox(), step_YeoJohnson(), etc.
    b. Discretize continuous variables (if needed and if you have no other choice): step_undisc() or step_disc()
  5. Categorical variable transformations
    a. Handle factor levels: step_other()
    b. Create dummy variables: step_dummy()
    c. Remove zero variance/near-zero variance predictors AGAIN (after creating dummy variables): step_zv() and/or step_nzv()
  6. Creation of interaction terms: step_interact(), step_poly(), etc.
  7. Scale/normalize numeric data (which now includes dummy variables): step_normalize(), step_center(), step_scale(), step_range(), etc.
  8. Algorithmic feature selection (performed manually or with functions from the colino package, for example): step_select_roc(), step_select_vip(), etc.
  9. Multivariate transformation: step_pca(), step_pls(), etc.
  10. Upsample/downsample data to address imbalance: step_upsample(), step_downsample(), step_ovun(), etc.

Critiques and suggestions are absolutely welcome!