how to do cross validation for multi variate time series lightgbm machine learning model using mlexperiments and mllrnrs

sahi · April 8, 2024, 2:45pm

I try to fit multivariate time series using a light gbm model. To build the model I am using mlexperiments and mllrnrs.

splitting the time series using timetk and sample

    splits <- production %>%time_series_split(date_var = newdate,assess="4 months", cumulative = TRUE)
train <- rsample::training(splits)%>% select(-newdate)
test <- rsample::testing(splits)%>% select(-newdate)

creating time series folding

    fold_list<- splitTools::create_timefolds(y = unlist(train_y),k = 5L, use_names = T, type =c ("extending"))

setting arguments and parameter grid

#required learner arguments, not optimized

learner_args <- list(
  max_depth = -1L,
  verbose = -1L,
  objective = "regression",
  metric = "l2"
)

set arguments for predict function and performance metric, required for mlexperiments::MLCrossValidation andmlexperiments::MLNestedCV

predict_args <- NULL
performance_metric <- metric("rmse")
performance_metric_args <- NULL
return_models <- TRUE

required for grid search

parameter_grid <- expand.grid(
  bagging_fraction = seq(0.6, 0.8, .2),
  feature_fraction = seq(0.6, 0.8, .2),
  min_data_in_leaf = seq(20, 40, 4),
  learning_rate = seq(0.1, 0.2, 0.1),
  num_leaves = seq(2, 20, 4))

optim_args <- list(
  iters.n = ncores,
  kappa = 3.5,
  acq = "ucb"
)

tuning the model

        tuner <- mlexperiments::MLTuneParameters$new(
      learner = mllrnrs::LearnerLightgbm$new(
        metric_optimization_higher_better = FALSE),strategy = "grid",ncores = ncores,seed = seed)

    tuner$parameter_grid <- parameter_grid 
tuner$learner_args <- learner_args 
tuner$set_data(x = train_x,y = train_y)
tuner_results_grid <- tuner$execute(k = 3)

until this I can able to run the code perfectly.

but when I started to do the cross-validation

validator <- mlexperiments::MLNestedCV$new(
+   learner = mllrnrs::LearnerLightgbm$new(
+     metric_optimization_higher_better = FALSE
+   ),
+   strategy = "grid",
+   fold_list = fold_list,
+   k_tuning = 3L,
+   ncores = ncores,
+   seed = seed
+ )
> validator <- mlexperiments::MLNestedCV$new(
+   learner = mllrnrs::LearnerLightgbm$new(
+     metric_optimization_higher_better = FALSE
+   ),
+   strategy = "grid",
+   fold_list = fold_list,
+   k_tuning = 3L,
+   ncores = ncores,
+   seed = seed
+ )
> validator$parameter_grid <- parameter_grid
> validator$learner_args <- learner_args
> validator$split_type <- "stratified"
> validator$predict_args <- predict_args
> validator$performance_metric <- performance_metric
> validator$performance_metric_args <- performance_metric_args
> validator$return_models <- return_models
> validator$set_data(
+   x = train_x,
+   y = train_y
+ )
> validator_results <- validator$execute()

I got an error

CV fold: Fold1 Error in kdry::mlh_subset(private$x, train_index) :
ids must be an integer

when I checked the validator environment I found that...

The below line in my code

fold_list = fold_list

is not working. mlexperiment and mllrns is not ready to accept the time series splitting output with in-sample and out sample for each fold.

How to resolve this. why mlexperiment and mllrns is not supporting for time series splitting??

system · April 29, 2024, 2:45pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.