Hi, I am a newcomer to tidymodels (and R in general) and have worked through the (fantastic) online literature that is available from yourselves and other bloggers to very nearly run a complete workflow.
I am able to successfully make predictions from several models independently but wanted to use the 'stacks' process to improve the overall prediction.
here is snippet of code so far;
# build recipe
trainDataHomeXg.recipe <- training(lgDataHomeXg.split) %>%
recipe(labelHomeXg ~ .) %>%
step_normalize(all_predictors()) %>%
step_sqrt(all_outcomes()) %>%
prep()
# bake
trainDataHomeXg.bake <-
bake(trainDataHomeXg.recipe, training(lgDataHomeXg.split))
trainDataHomeXg.folds <-
trainDataHomeXg.bake %>% vfold_cv(v = 5, repeats = 2)
# build XGBoost model
xGTrainDataXg.model <-
parsnip::boost_tree(
mode = "regression",
trees = tune(),
min_n = tune(),
tree_depth = tune(),
learn_rate = tune(),
loss_reduction = NULL,
stop_iter = NULL
) %>%
set_engine("xgboost")
### create workflow
xGTrainDataHomeXg.wflow <-
workflow() %>% add_recipe(trainDataHomeXg.recipe) %>% add_model(xGTrainDataXg.model)
# tune model
xGTrainDataHomeXg.tuneGridTight <- xGTrainDataHomeXg.wflow %>% tune_grid(
resamples = trainDataHomeXg.folds,
metrics = metric_set(rmse),
grid = 200,
control = control_stack_grid(),
param_info = parameters(trees(range = c(550, 1650)),
min_n(range = c(50, 100)),
tree_depth(range = c(4, 12)),
learn_rate(range = c(-2.5, -0.3), trans = log10_trans())
)
)
## I also build 3 other models (mlp, random forest and kNN) but the error occurs even if I ## add just the one model
trainDataHomeXg.stack <- stacks() %>%
add_candidates(xGTrainDataHomeXg.tuneGridTight)
trainDataHomeXg.stackBlendPred <-
trainDataHomeXg.stack %>% blend_predictions()
trainDataHomeXg.stackFitMembers <- trainDataHomeXg.stackBlendPred %>% fit_members()
trainDataHomeXg.stackPred <- testing(lgDataHomeXg.split) %>% bind_cols(predict(trainDataHomeXg.stackFitMembers, .))
Its at this stage I receive an error;
Error in sqrt(getElement(new_data, col_names[i])) : non-numeric argument to mathematical function
I have tried the usual online resources but struggling to make any headway. Not sure if the sort is part of the 'rmse' calculations or because I have put 'step_sqrt' in the recipe. I tried baking the data before adding to the predict() but this doesn't help.
This is the first time of asking online so if you need me to provide any more info or the information in a different format please let me know.
Thanks
Chris