Hi,
I am trying to use the output of tune_grid() as shown in the first part of chapter 14 of tmwr. I am running an xgboost model, but for some reason i get Error in check_gp_failure()
when trying to run tune_bayes()
using either the output of tune_grid()
for the initial
parameter of the function, or using an integer as suggested by the help page.
Reprex attached for reference:
library(tidyverse)
library(tidymodels)
set.seed(41)
# Load data
data(cells)
cells <- cells %>% select(-case)
# Split
cells_split <- initial_split(cells)
cells_train <- training(cells_split)
cells_test <- testing(cells_split)
cells_folds <- vfold_cv(cells_train, v = 5)
# Preprocessor
cells_recipe <- recipe(class ~ ., data = cells_train) %>%
step_nzv(all_predictors()) %>%
step_corr(all_predictors())
# Create model
xgb_spec <- boost_tree(mode = "classification",
trees = tune(),
mtry = tune(),
tree_depth = tune(),
min_n = tune(),
sample_size = tune(),
loss_reduction = tune(),
learn_rate = tune()
) %>%
set_engine("xgboost", importance = "permutation")
# Merge into workflow
cells_wf <- workflow() %>%
add_model(xgb_spec) %>%
add_recipe(cells_recipe)
xgb_grid <- grid_latin_hypercube(
trees(),
tree_depth(),
min_n(),
loss_reduction(),
sample_size = sample_prop(),
finalize(mtry(), cells_train),
learn_rate(),
size = 20
)
# Build grid and tune
xgb_tune_results <- tune_grid(
cells_wf,
resamples = cells_folds,
grid = xgb_grid,
control = control_grid(save_pred = TRUE),
metrics = metric_set(roc_auc)
)
bayes_param <- cells_wf %>%
extract_parameter_set_dials()
xgb_tune_bayes <- cells_wf %>%
tune_bayes(
iter = 10,
resamples = cells_folds,
param_info = bayes_param,
metrics = metric_set(roc_auc),
initial = xgb_tune_results,
control = control_bayes(save_pred = TRUE, verbose = TRUE)
)
#> Optimizing roc_auc using the expected improvement
#>
#> ── Iteration 1 ─────────────────────────────────────────────────────────────────
#>
#> i Current best: roc_auc=0.8958 (@iter 0)
#> i Gaussian process model
#> x Gaussian process model: Error in `.f()`:
#> ! The parameter object contains...
#> Error in `check_gp_failure()`:
#> ! Gaussian process model was not fit.
#> ✖ Optimization stopped prematurely; returning current results.
I can't seem to find the problem here, but clearly I'm missing something.
Any help is greatly appreciated