I'm trying to use recipes to calculate rolling statistics of time series to use as a predictor for a classification problem. I've tried 3 different recipe steps to do this but all seem to have limitations.
step_window: can only calculate one statistic per step, limited number of statistics and tuning of window length gives error "Error in window/2 : non-numeric argument to binary operator"
step_mutate_at: Can't provide arguments to user defined rolling functions and can't tune
step_slidify: Can't tune and can only pass one function per step
Is there a step that allows for the rolling calculation of any user defined statistic(s) over a window length that can be tuned and return new predictors along side the original predictors?
library(tidymodels)
library(zoo)
library(timetk)
# dummy time series data
data = data.frame(x = rnorm(500), y = as.factor(sample(c(0, 1), 500, replace = T)))
trend = function(x){
axis = seq_along(x)
slope_value = coef(.lm.fit(cbind(1, axis), x))[2]
return(slope_value)
}
rolling_trend = function(X, window = 24){
output = rollapply(X, window, fill = NA, align = "right", function(i){
axis = 1:window
slope_values = coef(.lm.fit(cbind(1, axis), i))[2]
return(slope_values)
})
return(output)
}
window = 24
feature_names = colnames(data)[1]
feature_recipe <- recipes::recipe(y ~ ., data = data) %>%
#step_mutate_at(all_predictors(), fn = list(trend = rolling_trend), role = "predictor") %>% # can't give arguments to fn, can't tune window
#step_window(all_predictors(), role = "predictor", statistic = "mean", size = window-1, names = paste0(feature_names, "_trend")) %>% # limited statistics
step_slidify(all_predictors(), period = window, align = "right", .f = trend, names = paste0(feature_names, "_trend")) %>% # can't tune period
step_naomit(all_predictors(), skip = T)
baked_data = bake(feature_recipe %>% prep(), data)
logit_spec <- logistic_reg(mode = "classification", penalty = tune(), mixture = 0.5) %>%
set_engine("glmnet")
model_cv = rsample::rolling_origin(data, initial=250, assess = 250, skip = 250)
model_workflow <- workflow() %>%
add_recipe(feature_recipe) %>%
add_model(logit_spec)
results = model_workflow %>%
tune_grid(grid = 5, resamples = model_cv)