I try to fit multivariate time series using a light gbm model. To build the model I am using mlexperiments and mllrnrs.
- splitting the time series using timetk and sample
splits <- production %>%time_series_split(date_var = newdate,assess="4 months", cumulative = TRUE)
train <- rsample::training(splits)%>% select(-newdate)
test <- rsample::testing(splits)%>% select(-newdate)
- creating time series folding
fold_list<- splitTools::create_timefolds(y = unlist(train_y),k = 5L, use_names = T, type =c ("extending"))
- setting arguments and parameter grid
#required learner arguments, not optimized
learner_args <- list(
max_depth = -1L,
verbose = -1L,
objective = "regression",
metric = "l2"
)
set arguments for predict function and performance metric, required for mlexperiments::MLCrossValidation andmlexperiments::MLNestedCV
predict_args <- NULL
performance_metric <- metric("rmse")
performance_metric_args <- NULL
return_models <- TRUE
required for grid search
parameter_grid <- expand.grid(
bagging_fraction = seq(0.6, 0.8, .2),
feature_fraction = seq(0.6, 0.8, .2),
min_data_in_leaf = seq(20, 40, 4),
learning_rate = seq(0.1, 0.2, 0.1),
num_leaves = seq(2, 20, 4))
optim_args <- list(
iters.n = ncores,
kappa = 3.5,
acq = "ucb"
)
- tuning the model
tuner <- mlexperiments::MLTuneParameters$new(
learner = mllrnrs::LearnerLightgbm$new(
metric_optimization_higher_better = FALSE),strategy = "grid",ncores = ncores,seed = seed)
tuner$parameter_grid <- parameter_grid
tuner$learner_args <- learner_args
tuner$set_data(x = train_x,y = train_y)
tuner_results_grid <- tuner$execute(k = 3)
until this I can able to run the code perfectly.
but when I started to do the cross-validation
validator <- mlexperiments::MLNestedCV$new(
+ learner = mllrnrs::LearnerLightgbm$new(
+ metric_optimization_higher_better = FALSE
+ ),
+ strategy = "grid",
+ fold_list = fold_list,
+ k_tuning = 3L,
+ ncores = ncores,
+ seed = seed
+ )
> validator <- mlexperiments::MLNestedCV$new(
+ learner = mllrnrs::LearnerLightgbm$new(
+ metric_optimization_higher_better = FALSE
+ ),
+ strategy = "grid",
+ fold_list = fold_list,
+ k_tuning = 3L,
+ ncores = ncores,
+ seed = seed
+ )
> validator$parameter_grid <- parameter_grid
> validator$learner_args <- learner_args
> validator$split_type <- "stratified"
> validator$predict_args <- predict_args
> validator$performance_metric <- performance_metric
> validator$performance_metric_args <- performance_metric_args
> validator$return_models <- return_models
> validator$set_data(
+ x = train_x,
+ y = train_y
+ )
> validator_results <- validator$execute()
I got an error
CV fold: Fold1 Error in kdry::mlh_subset(private$x, train_index) :
ids
must be an integer
when I checked the validator environment I found that...
The below line in my code
fold_list = fold_list
is not working. mlexperiment and mllrns is not ready to accept the time series splitting output with in-sample and out sample for each fold.
How to resolve this. why mlexperiment and mllrns is not supporting for time series splitting??