I have a "rsplit" object created by
rsample::initial_time_split()
Now I want a create just one validation set based on one column or order. I tried "validation_split()" but it just allows a random sampling. I went to "group_vfold_cv()" which gave the appropiate grouping but, as the name says, it will make a cross-validation and as such will give me 2 resamples.
folds = group_vfold_cv(training(df_split), group = 'column')
# Group 2-fold cross-validation
# A tibble: 2 x 2
splits id
<list> <chr>
1 <rsplit [40912/72608]> Resample1
2 <rsplit [72608/40912]> Resample2
I would like to make something like this:
folds = group_vfold_cv(training(df_split), group = 'column') %>%
filter(id == "Resample2")
But this breaks its class and converts it to a tibble that will not be recognized by the tuning function (tune_grid()).
Does anyone knows a way to accomplish this?
Here is a REPREX on what i would like to do:
library(tidymodels)
df = tibble( x = runif(100, 0 ,1), y = runif(100, 0,1), group_column = rep(c(1,0), 50))
df_split = initial_split(df, prop = 3/4)
#the filter changes the class that is needed for the tune_grid function
folds = group_vfold_cv(training(df_split), group = 'group_column') %>%
filter(id == "Resample2")
boost_spec <- parsnip::boost_tree(
trees = tune(),
tree_depth = tune()) %>%
set_engine("xgboost") %>%
set_mode("regression")
recipe <- recipe(y ~ ., data = head(training(df_split)))
boost_workflow = workflow() %>%
add_recipe(recipe) %>%
add_model(boost_spec)
set.seed(123)
boost_grid <- grid_max_entropy(
trees(),
tree_depth(),
size = 2)
boost_res = boost_workflow %>%
tune_grid(resamples = folds,
grid = boost_grid,
metrics = metric_set(rmse))
Thanks a lot!