how to do cross validation for multi variate time series lightgbm machine learning model using mlexperiments and mllrnrs

I try to fit multivariate time series using a light gbm model. To build the model I am using mlexperiments and mllrnrs.

  1. splitting the time series using timetk and sample
    splits <- production %>%time_series_split(date_var = newdate,assess="4 months", cumulative = TRUE)
train <- rsample::training(splits)%>% select(-newdate)
test <- rsample::testing(splits)%>% select(-newdate)
  1. creating time series folding
    fold_list<- splitTools::create_timefolds(y = unlist(train_y),k = 5L, use_names = T, type =c ("extending"))
  1. setting arguments and parameter grid

#required learner arguments, not optimized

learner_args <- list(
  max_depth = -1L,
  verbose = -1L,
  objective = "regression",
  metric = "l2"
)

set arguments for predict function and performance metric, required for mlexperiments::MLCrossValidation andmlexperiments::MLNestedCV

predict_args <- NULL
performance_metric <- metric("rmse")
performance_metric_args <- NULL
return_models <- TRUE

required for grid search

parameter_grid <- expand.grid(
  bagging_fraction = seq(0.6, 0.8, .2),
  feature_fraction = seq(0.6, 0.8, .2),
  min_data_in_leaf = seq(20, 40, 4),
  learning_rate = seq(0.1, 0.2, 0.1),
  num_leaves = seq(2, 20, 4))

optim_args <- list(
  iters.n = ncores,
  kappa = 3.5,
  acq = "ucb"
)
  1. tuning the model
        tuner <- mlexperiments::MLTuneParameters$new(
      learner = mllrnrs::LearnerLightgbm$new(
        metric_optimization_higher_better = FALSE),strategy = "grid",ncores = ncores,seed = seed)

    tuner$parameter_grid <- parameter_grid 
tuner$learner_args <- learner_args 
tuner$set_data(x = train_x,y = train_y)
tuner_results_grid <- tuner$execute(k = 3)

until this I can able to run the code perfectly.

but when I started to do the cross-validation

validator <- mlexperiments::MLNestedCV$new(
+   learner = mllrnrs::LearnerLightgbm$new(
+     metric_optimization_higher_better = FALSE
+   ),
+   strategy = "grid",
+   fold_list = fold_list,
+   k_tuning = 3L,
+   ncores = ncores,
+   seed = seed
+ )
> validator <- mlexperiments::MLNestedCV$new(
+   learner = mllrnrs::LearnerLightgbm$new(
+     metric_optimization_higher_better = FALSE
+   ),
+   strategy = "grid",
+   fold_list = fold_list,
+   k_tuning = 3L,
+   ncores = ncores,
+   seed = seed
+ )
> validator$parameter_grid <- parameter_grid
> validator$learner_args <- learner_args
> validator$split_type <- "stratified"
> validator$predict_args <- predict_args
> validator$performance_metric <- performance_metric
> validator$performance_metric_args <- performance_metric_args
> validator$return_models <- return_models
> validator$set_data(
+   x = train_x,
+   y = train_y
+ )
> validator_results <- validator$execute()

I got an error

CV fold: Fold1 Error in kdry::mlh_subset(private$x, train_index) :
ids must be an integer

when I checked the validator environment I found that...

The below line in my code

fold_list = fold_list

is not working. mlexperiment and mllrns is not ready to accept the time series splitting output with in-sample and out sample for each fold.

How to resolve this. why mlexperiment and mllrns is not supporting for time series splitting??

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.