Hey all, I need some help building an rset object with user-defined folds using the rsample package. The goal is to be able to build an rset object for time series data where the splits are defined using periods defined by the user. I want to be able to feed the rset object to tune::tune_bayes.
For the data given below, I would like the splits to be generated based on an every 4 week rule and come out to be as follows:
fold 1 train min/max = 2017-01-07 to 2020-03-28; fold 1 test min/max = 2020-04-04 to 2020-08-15
fold 2 train min/max = 2017-01-07 to 2020-04-25; fold 2 test min/max = 2020-05-02 to 2020-09-12
fold 3 train min/max = 2017-01-07 to 2020-05-23; fold 3 test min/max = 2020-05-30 to 2020-10-10
fold 4 train min/max = 2017-01-07 to 2020-06-20; fold 5 test min/max = 2020-06-27 to 2020-11-07
fold 5 train min/max = 2017-01-07 to 2020-07-18; fold 5 test min/max = 2020-07-25 to 2020-12-05
reprex data is:
dput(df_reprex)
structure(list(period = structure(c(17173, 17180, 17187, 17194,
17201, 17208, 17215, 17222, 17229, 17236, 17243, 17250, 17257,
17264, 17271, 17278, 17285, 17292, 17299, 17306, 17313, 17320,
17327, 17334, 17341, 17348, 17355, 17362, 17369, 17376, 17383,
17390, 17397, 17404, 17411, 17418, 17425, 17432, 17439, 17446,
17453, 17460, 17467, 17474, 17481, 17488, 17495, 17502, 17509,
17516, 17523, 17530, 17537, 17544, 17551, 17558, 17565, 17572,
17579, 17586, 17593, 17600, 17607, 17614, 17621, 17628, 17635,
17642, 17649, 17656, 17663, 17670, 17677, 17684, 17691, 17698,
17705, 17712, 17719, 17726, 17733, 17740, 17747, 17754, 17761,
17768, 17775, 17782, 17789, 17796, 17803, 17810, 17817, 17824,
17831, 17838, 17845, 17852, 17859, 17866, 17873, 17880, 17887,
17894, 17896, 17901, 17908, 17915, 17922, 17929, 17936, 17943,
17950, 17957, 17964, 17971, 17978, 17985, 17992, 17999, 18006,
18013, 18020, 18027, 18034, 18041, 18048, 18055, 18062, 18069,
18076, 18083, 18090, 18097, 18104, 18111, 18118, 18125, 18132,
18139, 18146, 18153, 18160, 18167, 18174, 18181, 18188, 18195,
18202, 18209, 18216, 18223, 18230, 18237, 18244, 18251, 18258,
18265, 18272, 18279, 18286, 18293, 18300, 18307, 18314, 18321,
18328, 18335, 18342, 18349, 18356, 18363, 18370, 18377, 18384,
18391, 18398, 18405, 18412, 18419, 18426, 18433, 18440, 18447,
18454, 18461, 18468, 18475, 18482, 18489, 18496, 18503, 18510,
18517, 18524, 18531, 18538, 18545, 18552, 18559, 18566, 18573,
18580, 18587, 18594, 18601), class = "Date"), units = c(1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), class = "data.frame", row.names = c(NA,
-206L))