Hi,
I have a model which i am trying to classify a binary outcome per day
We use profiling of entities to achieve this. More concretely as each day progresses, the results from the previous day are incorporated into the profile data to help predict the model. I want to ascertain the accuracy of our model and I thought the best way to do this would be to use sliding_period
in the rsamples
package.
Below is an example of an attempt at cross validation. The profile data are basically features of the train dataset as well as the regular features of the dataset. I have just split them for clarity.
The second slice now updates the profile data based on the actual results of the assessment. More specifically, if the profile of an entity had 20 positive cases in the 90 day profile. The assessment contains another 10 positive cases. These 10 cases would be added to the 20 in the second slice (91 days) to assess the model on slice two. Below is a rough picture . The assessment set would also have access to the same features
Profile (90 days) train Assessment (+1 Day)
|------------------| + |----------| |---------|
Profile (91 days) train Assessment (+2 Day)
|-------------------| + |----------| |---------|
Profile (92 days) train Assessment (+3 Day)
|--------------------| + |----------| |---------|
I have two questions
Does rsamples have something that would allow me to achieve this.
I thought the function sliding_period
would have been perfect for this but when i run the code below based on the dates provided below. I get the error
> resamples <- sliding_period(
+ train,
+ my_date,
+ "day",
+ lookback = Inf,
+ assess_stop = 1,
+ skip = 4,
+ step = 2
+ )
Error: `.i` must be in ascending order.
i It is not ascending at locations: 217732, 217992, 218004, 217....
Run `rlang::last_error()` to see where the error occurred.
Below is a list of the dates I am using to partition the data
janitor::tabyl(train$my_date)
train$my_date n percent
2020-03-01 577 0.002646631
2020-03-02 3039 0.013939536
2020-03-03 5090 0.023347232
2020-03-04 3172 0.014549591
2020-03-05 2999 0.013756060
2020-03-06 2916 0.013375349
2020-03-07 1649 0.007563769
2020-03-08 456 0.002091618
2020-03-09 2863 0.013132244
2020-03-10 3162 0.014503722
2020-03-11 3238 0.014852325
2020-03-12 3028 0.013889080
2020-03-13 3206 0.014705545
2020-03-14 1814 0.008320605
2020-03-15 535 0.002453982
2020-03-16 3173 0.014554178
2020-03-17 3248 0.014898194
2020-03-18 3129 0.014352355
2020-03-19 3093 0.014187227
2020-03-20 3204 0.014696371
2020-03-21 1643 0.007536248
2020-03-22 344 0.001577888
2020-03-23 2904 0.013320307
2020-03-24 2988 0.013705605
2020-03-25 2775 0.012728599
2020-03-26 2634 0.012081848
2020-03-27 2808 0.012879966
2020-03-28 1637 0.007508727
2020-03-29 498 0.002284267
2020-03-30 2811 0.012893727
2020-03-31 2819 0.012930422
2020-04-01 2610 0.011971763
2020-04-02 2618 0.012008458
2020-04-03 3287 0.015077083
2020-04-04 981 0.004499732
2020-04-05 431 0.001976946
2020-04-06 1740 0.007981175
2020-04-07 3350 0.015366056
2020-04-08 2971 0.013627628
2020-04-09 2759 0.012655209
2020-04-10 2512 0.011522249
2020-04-11 1410 0.006467504
2020-04-12 420 0.001926491
2020-04-13 2284 0.010476439
2020-04-14 2727 0.012508428
2020-04-15 3041 0.013948709
2020-04-16 2985 0.013691844
2020-04-17 3114 0.014283552
2020-04-18 884 0.004054804
2020-04-19 396 0.001816405
2020-04-20 2014 0.009237981
2020-04-21 2021 0.009270089
2020-04-22 3235 0.014838565
2020-04-23 2846 0.013054267
2020-04-24 2889 0.013251503
2020-04-25 976 0.004476797
2020-04-26 743 0.003408054
2020-04-27 2935 0.013462500
2020-04-28 3019 0.013847798
2020-04-29 2966 0.013604693
2020-04-30 3692 0.016934770
2020-05-01 1345 0.006169357
2020-05-02 817 0.003747483
2020-05-03 417 0.001912730
2020-05-04 2535 0.011627747
2020-05-05 2334 0.010705784
2020-05-06 2973 0.013636801
2020-05-07 2936 0.013467087
2020-05-08 3089 0.014168880
2020-05-09 1572 0.007210579
2020-05-10 270 0.001238458
2020-05-11 3213 0.014737653
2020-05-12 3360 0.015411925
2020-05-13 3227 0.014801870
2020-05-14 3241 0.014866086
2020-05-15 3508 0.016090784
2020-05-16 1479 0.006783999
2020-05-17 559 0.002564067
2020-05-18 3441 0.015783462
2020-05-19 3657 0.016774229
2020-05-20 3733 0.017122832
2020-05-21 3212 0.014733066
2020-05-22 3322 0.015237623
2020-05-23 1502 0.006889497
2020-05-24 303 0.001389825
2020-05-25 2290 0.010503961
2020-05-26 3164 0.014512896
2020-05-27 3093 0.014187227
2020-05-28 3126 0.014338594
2020-05-29 3292 0.015100017
2020-05-30 1109 0.005086853
2020-05-31 586 0.002687913