I'm building a recipe, and I need to address missing values in binary variables. Those variables contain either 1, 0, or NA. I want to use imputation to replace the NA values, and found step_impute_mode().
However, step_impute_mode() accepts only nominal variables (i.e., of class factor or character). Although I could first use step_num2factor()and then step_impute_mode(), it is problematic because then I'm stuck with variables of class factor, whereas the model engine requires them to be numeric. As far as I could see, recipe package doesn't have step_*() verbs that convert from factor to numeric.
So my question is: how can I replace NA by imputing the mode in numeric variables that have values 0 and 1?
Thanks!
EDIT
Here's some toy data to demonstrate the situation. I would like to write a recipe for the formula y ~ ., and I want to impute the mode (respective to each column) to replace the missing values in x1 and x2.
Furthermore, I want x1 and x2 to remain numeric after the imputation. How can I do it using recipes package?
set.seed(123)
x1 <- rbinom(100, 1, runif(1))
x2 <- rbinom(100, 1, runif(1))
y <- rbinom(100, 1, runif(1))
# sprinkle some NAs
my_df <- data.frame(y, x1, x2)
my_df[c("x1", "x2")] <-
lapply(my_df[c("x1", "x2")], function(x) {
x[sample(seq_along(x), 0.25 * length(x))] <- NA
x
})
head(my_df)
#> y x1 x2
#> 1 1 1 0
#> 2 1 0 NA
#> 3 0 1 0
#> 4 1 NA 1
#> 5 1 NA 1
#> 6 1 NA NA
Created on 2021-12-23 by the reprex package (v2.0.1.9000)