I am trying to replicate code shown in Tidy Models With R book using this kaggle dataset. However I and running into some issues with recipes and model global explanations.
Here's the code to reproduce my work -
# Libraries ----
library(tidyverse)
library(janitor)
library(tidymodels)
library(DALEXtra)
# Load Data ----
campaign_tbl_raw <- data.table::fread("../data/marketing_campaign.csv", sep = ";") %>%
clean_names() %>%
as_tibble()
campaign_tbl <- campaign_tbl_raw %>%
filter(!income > 200000) %>%
mutate(response = as.factor(response))
# Data Split ----
set.seed(123)
data_split <- initial_split(campaign_tbl, prop = 0.8, strata = response)
train_tbl <- training(data_split)
test_tbl <- testing(data_split)
# Recipe ----
glmnet_base_recipe <- glmnet_recipe <- recipe(formula = response ~ ., data = train_tbl) %>%
step_rm(starts_with("z_")) %>%
update_role(id, new_role = "indicator") %>%
**step_string2factor**(one_of(education, marital_status)) %>%
step_mutate(dt_customer = as.numeric(dt_customer)) %>%
step_novel(all_nominal(), -all_outcomes()) %>%
step_dummy(all_nominal(), -all_outcomes()) %>%
step_zv(all_predictors()) %>%
step_normalize(year_birth, income, dt_customer, recency, starts_with("mnt_"), starts_with("num_")) %>%
themis::step_upsample(response, over_ratio = 0.5)
glmnet_base_recipe %>% prep() %>% juice() %>% glimpse()
First Issue - When I try to prep and glimpse the recipe above, I get the following error -
Error in `instrument_base_errors()`: ! object 'education' not found Caused by error in `map_lgl()`: ! object 'education' not found Run `rlang::last_error()` to see where the error occurred.
Unfortunately, I'm not sure how to interpret the Run `rlang::last_error()` to see where the error occurred
message. However when I take out the step_string2factor(one_of(education, marital_status))
step, then the recipe works just fine, so I update the recipe and proceed -
# Recipe ----
glmnet_base_recipe <- glmnet_recipe <- recipe(formula = response ~ ., data = train_tbl) %>%
step_rm(starts_with("z_")) %>%
update_role(id, new_role = "indicator") %>%
step_mutate(dt_customer = as.numeric(dt_customer)) %>%
step_novel(all_nominal(), -all_outcomes()) %>%
step_dummy(all_nominal(), -all_outcomes()) %>%
step_zv(all_predictors()) %>%
step_normalize(year_birth, income, dt_customer, recency, starts_with("mnt_"), starts_with("num_")) %>%
themis::step_upsample(response, over_ratio = 0.5)
# Model Spec ----
base_glmnet_spec <- logistic_reg(
penalty = 0.1,
mixture = 0.5
) %>%
set_mode("classification") %>%
set_engine("glmnet")
# Workflow Spec ----
glmnet_base_workflow <- workflow() %>%
add_recipe(glmnet_base_recipe) %>%
add_model(base_glmnet_spec)
# Fit ----
glmnet_base_fit <- glmnet_base_workflow %>%
fit(train_tbl)
Second Issue - I am trying to follow the steps used in TMWR to explain models and predictions. In the book, they first build an explainer (see section 18.1). I follow the same steps using the code below -
# Explainer ----
explainer_glmnet <- explain_tidymodels(
glmnet_base_fit,
data = train_tbl,
y = train_tbl$response,
label = "lm base",
verbose = FALSE
)
However I get the a warning -
Warning message: In Ops.factor(y, predict_function(model, data)) : ‘-’ not meaningful for factors
I googled the error and learned that this message indicates there a data type not suitable for computation, however in my case, I'm not sure where, or how to fix it.
Finally, I try to replicate the global explanations in TMWR (see section 18.3) with the code -
# Variable Importance Via model_parts() ----
set.seed(123)
vip_glmnet <- model_parts(explainer_glmnet, loss_function = loss_one_minus_auc)
However I get the following error -
Error in Summary.factor(1L, na.rm = FALSE) : ‘sum’ not meaningful for factors
I'm assuming this has something to do with the warning message earlier, however I'm at a loss for how to fix. Any help will be appreciated