I'm trying to use Julia Silge's tutorial on xgboost to build an analysis on an unbalanced dataset (Tune XGBoost with tidymodels and #TidyTuesday beach volleyball | Julia Silge). I made a single change to her code to add upsampling (e.g., step_upsample(...)) in a recipe step. However when I use a recipe() call rather than the add_formula() in her code, the tuning step fails. e.g., inserting the recipe call here:
xgb_wf <- workflow() %>%
add_recipe(recipe(win ~ ., data = vb_train)) %>%
# add_formula(win ~ .) %>%
add_model(xgb_spec)
══ Workflow ═════════
Preprocessor: Recipe
Model: boost_tree()── Preprocessor ────────
0 Recipe Steps── Model ───────────
Boosted Tree Model Specification (classification)Main Arguments:
mtry = tune()
trees = 1000
min_n = tune()
tree_depth = tune()
learn_rate = tune()
loss_reduction = tune()
sample_size = tune()Computational engine: xgboost
But then at the tune_grid step I get an error:
xgb_res <- tune_grid(
xgb_wf,
resamples = vb_folds,
grid = xgb_grid,
control = control_grid(save_pred = TRUE)
)
Fold10: preprocessor 1/1, model 30/30: Error in xgboost::xgb.DMatrix(x, label = y, missing = NA): 'data' has class 'character' and length 193500.
Does anyone have any hints on what I can do to fix it?
There are similar error messages reported here: tidymodels: error when predicting on new data with xgboost model
and here: xgboost works with add_formula but not with recipe
Thanks,
Rich