Hi,
Does anyone know of a good way to exclude certain calculated variables from later steps in recipes? My specific use case is that I create a dummy variable out of a character variable, and then I want to center all numeric variables. However, I don't want to center the dummy variable. Look at the example below:
library(dplyr, warn.conflicts = FALSE)
library(recipes, warn.conflicts = FALSE)
library(nycflights13)
small_df <- nycflights13::flights %>%
select(dep_delay, arr_delay, air_time, origin)
head(small_df)
#> # A tibble: 6 x 4
#> dep_delay arr_delay air_time origin
#> <dbl> <dbl> <dbl> <chr>
#> 1 2 11 227 EWR
#> 2 4 20 227 LGA
#> 3 2 33 160 JFK
#> 4 -1 -18 183 JFK
#> 5 -6 -25 116 LGA
#> 6 -4 12 150 EWR
rec <- recipe(air_time ~ ., data = small_df)
rec2 <- rec %>%
step_dummy(origin) %>%
step_center(all_predictors())
prepped_small <- prep(rec2, small_df) %>% juice()
head(prepped_small)
#> # A tibble: 6 x 5
#> dep_delay arr_delay air_time origin_JFK origin_LGA
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 -10.6 4.10 227 -0.330 -0.311
#> 2 -8.64 13.1 227 -0.330 0.689
#> 3 -10.6 26.1 160 0.670 -0.311
#> 4 -13.6 -24.9 183 0.670 -0.311
#> 5 -18.6 -31.9 116 -0.330 0.689
#> 6 -16.6 5.10 150 -0.330 -0.311
Created on 2019-10-24 by the reprex package (v0.3.0)
origin_JFK
should have values 0 and 1, not -0.33 and 0.67.
Is there a direct way to do it in recipes?
Thanks,