Tuning window length for rolling time series features

Ada_Nick · June 24, 2020, 11:16am

I'm trying to use recipes to calculate rolling statistics of time series to use as a predictor for a classification problem. I've tried 3 different recipe steps to do this but all seem to have limitations.

step_window: can only calculate one statistic per step, limited number of statistics and tuning of window length gives error "Error in window/2 : non-numeric argument to binary operator"

step_mutate_at: Can't provide arguments to user defined rolling functions and can't tune

step_slidify: Can't tune and can only pass one function per step

Is there a step that allows for the rolling calculation of any user defined statistic(s) over a window length that can be tuned and return new predictors along side the original predictors?

library(tidymodels)
library(zoo)
library(timetk)

# dummy time series data
data  = data.frame(x = rnorm(500), y = as.factor(sample(c(0, 1), 500, replace = T)))

trend = function(x){
  axis = seq_along(x)
  slope_value = coef(.lm.fit(cbind(1, axis), x))[2] 
  
  return(slope_value)
} 
rolling_trend = function(X, window = 24){
  output = rollapply(X, window, fill = NA, align = "right", function(i){
    axis = 1:window
    slope_values = coef(.lm.fit(cbind(1, axis), i))[2] 
    
    return(slope_values)
  })
  return(output)
}

window = 24
feature_names = colnames(data)[1]

feature_recipe <-  recipes::recipe(y ~ ., data = data) %>%
  #step_mutate_at(all_predictors(), fn = list(trend = rolling_trend), role = "predictor") %>% # can't give arguments to fn, can't tune window
  #step_window(all_predictors(), role = "predictor", statistic = "mean", size = window-1, names = paste0(feature_names, "_trend")) %>% # limited statistics
  step_slidify(all_predictors(), period = window, align = "right", .f = trend, names = paste0(feature_names, "_trend")) %>% # can't tune period
  step_naomit(all_predictors(), skip = T) 

baked_data = bake(feature_recipe %>% prep(), data)

logit_spec <- logistic_reg(mode = "classification", penalty = tune(), mixture = 0.5) %>%
  set_engine("glmnet")

model_cv = rsample::rolling_origin(data, initial=250, assess = 250, skip = 250)

model_workflow <- workflow() %>%
  add_recipe(feature_recipe) %>%
  add_model(logit_spec)

results = model_workflow %>% 
  tune_grid(grid = 5, resamples = model_cv)

system · July 15, 2020, 11:22am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.